introduction to opencl
DESCRIPTION
Brief introduction to OpenCL. Included a demo showing the source code of a vector addition.TRANSCRIPT
![Page 1: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/1.jpg)
INTRODUCTION TO OPENCL Unai Lopez Intelligent Systems Group
Department of Computer Architecture & Technology
University of the Basque Country
![Page 2: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/2.jpg)
Outline
1) Introduction
2) Programming Basics
3) “Hello World”
4) Final remarks
![Page 3: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/3.jpg)
OpenCL • Standard for the development of data parallel applications
• Most used for the development of GPGPU applications: General Purpose computing on Graphics Processing Units
• A GPU is comprised of hundreds of compute cores
• Specialized for massively data parallel computation nVidia GTX 285 (240 Compute cores) nVidia GT200b Architecture
![Page 4: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/4.jpg)
OpenCL & GPGPU • GPGPU: Take advantage of GPU’s computing power to
make massively parallel applications
• Parallel applications with huge acceleration in Molecular Dynamics, Image Processing, Evolutionary Computation,…
• All cases based on data parallelism: each thread processes a subset of the data
• For example, a vector addition:
A
B
C ||
+
Thread ID 0 1 2 3 4 5 6 7 8 9 10 11
![Page 5: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/5.jpg)
• Furthermore, OpenCL provides portability: same code can run on different architectures
• For example:
OpenCL
Intel Xeon Phi 50 cores @ 1 Ghz
Intel Core i5 CPU 4 cores @ 2’5 Ghz
AMD HD 6950 GPU 1408 cores @ 800 Mhz
STICell B/E 8 cores @ 3,2 Ghz
![Page 6: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/6.jpg)
OpenCL • Provides the following abstraction:
A compute device is composed by compute units
• OpenCL platform: Host + Compute Devices
• Each manufacturer provides an SDK: • NVIDIA SDK for GPUs • AMD APP for CPUs/GPU • Intel for CPUs • IBM for PowerPC and Cell B/E
![Page 7: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/7.jpg)
Programming Basics • Kernel: function that defines the behavior of each thread • For example, kernel for vector addition:
• Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: • get_global_id: obtains thread index • barrier: synchronizes threads
__kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0);
c[i] = a[i] + b[i];
}
![Page 8: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/8.jpg)
Programming Basics • An OpenCL applications consists of:
• Basic host application flow: 1. Load and Compilation of kernel 2. Data copy from host to device (e.g. from CPU to GPU) 3. Execution of kernel 4. Data copy from device to host 5. Release kernels and data from device memory
• Execution using command queue in each device
Kernel file (OpenCL-C): problem computation Host code(C): kernel management
![Page 9: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/9.jpg)
Programming Basics • Host code: programmed using OpenCL API
• API Calls, such as: • clCreateProgramWithSource: Load kernel from char* • clBuildProgram: Compile kernel • clSetKernelArgs: Set kernel arguments for the device • clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device • clEnqueueNDRangerKernel: Launch kernel in device
• API Types, such as: • cl_mem: Pointer to device memory objects • cl_program: Kernel object • cl_float / cl_int / cl_uint: Redefinition of C types
![Page 10: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/10.jpg)
Hello World
• Implementation of simple vector addition in OpenCL
• Checks for default platform and device in the system
• Modify Makefile with proper paths in each system
• Run: vectorAdd <size_of_vector>
![Page 11: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/11.jpg)
Final Remarks • OpenCL does not provide performance portability
• Alternative to NVIDIA CUDA: Programming paradigm for NVIDIA GPU cards
• Combinable with other parallel programming models: • OpenMP for SMPs / MPI for MPPs
• Huge ecosystems for OpenCL, e.g. OpenACC: Develop GPGPU applications using directives
#pragma acc kernels for(i = 0; i< N; i++)
c[i] = b[i] + a[i];
![Page 12: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/12.jpg)
More about OpenCL • Before starting to develop take a look at:
• Context, command queues, events,…
• Documentation • Khronos Group: Maintainers of OpenCL • OpenCL Best practices guide in CUDA/AMD SDKs • Programming Massively Parallel Processors (Book for CUDA)
• OpenCL sample applications: • Most SDKs include example OpenCL applications • Rodinia: http://lava.cs.virginia.edu/wiki/rodinia • Parboil: http://impact.crhc.illinois.edu/parboil.aspx
![Page 13: Introduction to OpenCL](https://reader033.vdocument.in/reader033/viewer/2022042507/554f8c64b4c905435d8b4e29/html5/thumbnails/13.jpg)
INTRODUCTION TO OPENCL Unai Lopez – [email protected] Intelligent Systems Group
Department of Computer Architecture & Technology
University of the Basque Country