introduction to opencl programming · introduction to opencl programming nasos iliopoulos george...
TRANSCRIPT
![Page 1: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/1.jpg)
Introduction to OpenCL programming
Nasos Iliopoulos
George Mason University, resident at Computational
Multiphysics Systems Lab.
Center of Computational Material Science
Naval Research Laboratory
Washington, DC, USA
ASME 2012 International Design Engineering Technical Conferences &
Computers and Information in Engineering Conference, IDETC/CIE 2012
August 12-15, 2012, Chicago, Illinois, USA
![Page 2: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/2.jpg)
OpenCL overview
• Industry accepted standard.
Vendors provide implementations
• Take advantage of massively parallel execution to accelerate computations.
• Cross-platform in a wide sense:
Multiple OSes (Linux, Windows, OS X).
Multiple Devices (GPUs, CPUs, …).
Multiple Vendors (AMD, nVidia, Intel, Apple, ...).
• C – like syntax.
![Page 3: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/3.jpg)
OpenCL main differences with cuda
![Page 4: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/4.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
![Page 5: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/5.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
• (+) OpenCL has a diverse ecosystem.
![Page 6: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/6.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
• (+) OpenCL has a diverse ecosystem.
• (+) OpenCL runs on GPUs and CPUs.
![Page 7: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/7.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
• (+) OpenCL has a diverse ecosystem.
• (+) OpenCL runs on GPUs and CPUs.
• (+) OpenCL runs on AMD and nVidia GPUs.
![Page 8: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/8.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
• (+) OpenCL has a diverse ecosystem.
• (+) OpenCL runs on GPUs and CPUs.
• (+) OpenCL runs on AMD and nVidia GPUs.
• (+) OpenCL uses the native compiler.
![Page 9: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/9.jpg)
OpenCL main differences with cuda
• (+) Cuda is supported only by nVidia.
• (+) OpenCL has a diverse ecosystem.
• (+) OpenCL runs on GPUs and CPUs.
• (+) OpenCL runs on AMD and nVidia GPUs.
• (+) OpenCL uses the native compiler.
• (-) OpenCL is slightly slower then Cuda on nVidia GPUs. (~5%)
![Page 10: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/10.jpg)
OpenCL hierarchy of models
•Platform model (Host + OpenCL devices)
•Execution model (kernels-functions + programs)
•Memory model (storage of arrays – buffers)
•Programming model (data parallel or task parallel)
![Page 11: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/11.jpg)
OpenCL Platform model
![Page 12: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/12.jpg)
OpenCL Platform model
Host (i.e. PC)
![Page 13: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/13.jpg)
OpenCL Platform model
Host (i.e. PC)
Compute device (i.e. GPU, CPU, …)
![Page 14: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/14.jpg)
OpenCL Platform model
Host (i.e. PC)
Compute device (i.e. GPU, CPU, …)
Compute unit Executes work-groups that are collections of work-items
![Page 15: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/15.jpg)
OpenCL Platform model
Host (i.e. PC)
Compute device (i.e. GPU, CPU, …)
Compute unit Executes work-groups that are collections of work-items
Processing Element Virtual processor executing work items
![Page 16: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/16.jpg)
OpenCL Execution Model
• Kernel :
Analogous to a function
• Program:
Collection of kernels
Analogous to a library of functions
• Application queue
Kernels queued in order
Kernels executed in-order or out-of-order
Managed at the Device level
Managed at the Host level
![Page 17: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/17.jpg)
OpenCL memory model
![Page 18: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/18.jpg)
OpenCL memory model
PE 1
PROCESSING ELEMENT: • Virtual Processor • Maps to a physical processor at some point in time
![Page 19: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/19.jpg)
OpenCL memory model
PE 1
Private
Memory 1
PROCESSING ELEMENT: • Virtual Processor • Maps to a physical processor at some point in time
![Page 20: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/20.jpg)
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Compute unit is usually referred to as a “Work Group”
![Page 21: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/21.jpg)
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit is usually referred to as a “Work Group”
![Page 22: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/22.jpg)
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit N
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory N
![Page 23: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/23.jpg)
Compute Device
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit N
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory N
![Page 24: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/24.jpg)
Compute Device
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit N
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory N
Global / Constant Memory Data Cache
![Page 25: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/25.jpg)
Compute Device
OpenCL memory model
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit N
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory N
Global / Constant Memory Data Cache
Global Memory
Constant Memory
![Page 26: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/26.jpg)
Programming Model
![Page 27: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/27.jpg)
Programming Model
• Supports two programming models: data parallel and task parallel
![Page 28: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/28.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data.
• Supports two programming models: data parallel and task parallel
![Page 29: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/29.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data. Example: array increment
… 6 2 3 5 5
… 7 3 4 6 6
• Supports two programming models: data parallel and task parallel
![Page 30: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/30.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data. Example: array increment
… 6 2 3 5 5
… 7 3 4 6 6
Element increment is processed in parallel
• Supports two programming models: data parallel and task parallel
![Page 31: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/31.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data. Example: array increment
• Task parallel: Each processing element executes a different task on the same or different data.
… 6 2 3 5 5
… 7 3 4 6 6
Element increment is processed in parallel
• Supports two programming models: data parallel and task parallel
![Page 32: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/32.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data. Example: array increment
• Task parallel: Each processing element executes a different task on the same or different data.
… 6 2 3 5 5
… 7 3 4 6 6
… 6 2 3 5 5
… 7 3 4 6 6
… 4 2 1 3 4
… 8 4 2 6 8
Task A (array 1: increment) Task B (array 2: mult. by 2)
Element increment is processed in parallel
• Supports two programming models: data parallel and task parallel
![Page 33: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/33.jpg)
Programming Model
• Data parallel : Processing Elements execute the same task on different pieces of distributed data. Example: array increment
• Task parallel: Each processing element executes a different task on the same or different data.
… 6 2 3 5 5
… 7 3 4 6 6
… 6 2 3 5 5
… 7 3 4 6 6
… 4 2 1 3 4
… 8 4 2 6 8
Task A and B executed in parallel
Element increment is processed in parallel
Task A (array 1: increment) Task B (array 2: mult. by 2)
• Supports two programming models: data parallel and task parallel
![Page 34: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/34.jpg)
OpenCL execution process
• Create an OpenCL context bound to a Device type.
• Create a command queue on one of the devices of the
context.
• Allocate and create memory buffer objects.
• Create and build the OpenCL program.
• Create a kernel object from the kernels in the program.
• Execute the kernel.
• Read results if needed.
• Clean up.
![Page 35: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/35.jpg)
OpenCL example – array increment
• Array increment
… 6 2 3 5 5
… 7 3 4 6 6 �
numElements
![Page 36: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/36.jpg)
OpenCL example – array increment
void
aInc( const unsigned int n,
float *a) {
for (std::size_t i=0; i!=n; i++)
a[i]=a[i]+1.0;
}
C++ - SERIAL VERSION
Array increment
![Page 37: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/37.jpg)
OpenCL example – array increment
void
aInc( const unsigned int n,
float *a) {
for (std::size_t i=0; i!=n; i++)
a[i]=a[i]+1.0;
}
C++ - SERIAL VERSION
__kernel void
aInc( __global const unsigned int n,
__global float *a) {
unsigned int i=get_global_id(0);
if (i<n)
a[i] = a[i]+1.0;
}
OpenCL VERSION
Array increment
![Page 38: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/38.jpg)
Compute Device
OpenCL example – array increment
Compute unit 1
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory 1
Compute unit N
PE 1
Private
Memory 1
PE N
Private
Memory N
Local Memory N
Global / Constant Memory Data Cache
Global Memory
Constant Memory
![Page 39: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/39.jpg)
OpenCL example – array increment
void
aInc( const unsigned int n,
float *a) {
for (std::size_t i=0; i!=n; i++)
a[i]=a[i]+1.0;
}
C++ - SERIAL VERSION
__kernel void
aInc( __global const unsigned int n,
__global float *a) {
unsigned int i=get_global_id(0);
if (i<n)
a[i] = a[i]+1.0;
}
OpenCL VERSION
Array increment
• A kernel can be thought as the body of a for-loop • Note how indexing is happening in the OpenCL version
![Page 40: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/40.jpg)
OpenCL example – array increment
• Include the OpenCL header:
#include <CL/opencl.h>
• Compiler include paths (i.e. nVidia SDK):
-I$SDK_PATH/OpenCL/common/inc -I$SDK_PATH/shared/inc
• Link libraries:
Typical compilation setup
-lOpenCL
![Page 41: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/41.jpg)
OpenCL example – array increment
• Get an OpenCL platform:
error = clGetPlatformIDs(1, &cpPlatform, NULL);
If (error !=CL_SUCCESS) { // Error handling}
• Get the devices
error= clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice,
NULL);
• Create the context
Initialization
GPUContext = clCreateContext(0, 1, &cdDevice, NULL, NULL, &error);
• Create a command - queue
cqCommandQueue = clCreateCommandQueue(cxGPUContext, cdDevice,
0, &error);
![Page 42: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/42.jpg)
OpenCL example – array increment
• Create the program object
cpProgram = clCreateProgramWithSource(cxGPUContext, 1, (const char
**)&cSourceCL, &szKernelLength, &error);
Compile the kernel
![Page 43: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/43.jpg)
OpenCL example – array increment
• Create the program object
cpProgram = clCreateProgramWithSource(cxGPUContext, 1, (const char
**)&cSourceCL, &szKernelLength, &error);
__kernel void
aInc( __global const unsigned int n,
__global float *a) {
unsigned int i=get_global_id(0);
if (i<�n)
a[i] = a[i]+1.0;
}
cSourceCL =
Compile the kernel
![Page 44: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/44.jpg)
OpenCL example – array increment
• Create the program object
cpProgram = clCreateProgramWithSource(cxGPUContext, 1, (const char
**)&cSourceCL, &szKernelLength, &error);
• Compile the program
error = clBuildProgram(cpProgram, 0, NULL, NULL, NULL, NULL);
• Create the kernel
ckKernel = clCreateKernel(cpProgram, ”aInc", &error);
Compile the kernel
![Page 45: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/45.jpg)
OpenCL example – array increment
• Create and fill an array on the host
std::vector<float> a_host(szGlobalWorkSize);
(for std::size_t i=0; i!=numElements; i++)
a_host[i]=i;
• Create a buffer on the GPU
Load some data to the GPU
cmDevSrcA = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY,
sizeof(cl_float) * szGlobalWorkSize, NULL, &error);
• Asynchronously Copy the data to the GPU
error = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0,
sizeof(cl_float) * szGlobalWorkSize, &a_host[0], 0, NULL, NULL);
![Page 46: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/46.jpg)
OpenCL example – array increment
• Set the kernel arguments
error = clSetKernelArg(ckKernel, 0, sizeof(cl_uint), (void*)&numElements);
error |= clSetKernelArg(ckKernel, 1, sizeof(cl_mem), (void*)&cmDevSrcA);
• Execute the kernel
Set kernel arguments and execute it
Error = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL,
&szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
![Page 47: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/47.jpg)
OpenCL example – array increment
• Get the result from the GPU
error = clEnqueueReadBuffer(cqCommandQueue, cmDevSrcA, CL_TRUE,
0, sizeof(cl_float) * szGlobalWorkSize, dst, 0, NULL, NULL);
Post - processing
.
.
.
.
![Page 48: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/48.jpg)
OpenCL example – array increment
Array increment performance
C serial version vs OpenCL
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
C Serial version - i7 @3.9GHz OpenCL - Tesla c1060
Execu
tio
n T
ime (
sec)
About 9x speedup.
![Page 49: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/49.jpg)
OpenCL example – array reversal
… 6 2 3 5 5
… 2 3 5 5 6
![Page 50: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/50.jpg)
OpenCL example – array reversal
A simple kernel
![Page 51: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/51.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev( __global const float* in,
__global float *out,
int iNumElements)
A simple kernel
![Page 52: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/52.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev( __global const float* in,
__global float *out,
int iNumElements)
{
// get index into global data array
const int iGID = get_global_id(0);
// bound check
if (iGID >= iNumElements) return;
A simple kernel
![Page 53: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/53.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev( __global const float* in,
__global float *out,
int iNumElements)
{
// get index into global data array
const int iGID = get_global_id(0);
// bound check
if (iGID >= iNumElements) return;
// Run “out” reversely
const int oGID=iNumElements – iGID - 1;
out[oGID] = in[iGID];
};
A simple kernel
![Page 54: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/54.jpg)
OpenCL example – array reversal
• Set the kernel arguments
• Create buffers on the GPU
cmDevSrcA = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY,
sizeof(cl_float) * szGlobalWorkSize, NULL, &error);
cmDevDstB = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY,
sizeof(cl_float) * szGlobalWorkSize, NULL, &error);
error = clSetKernelArg(ckKernel, 0, sizeof(cl_mem), (void*)&cmDevSrcA);
error |= clSetKernelArg(ckKernel, 1, sizeof(cl_mem), (void*)&cmDevDstB );
error |= clSetKernelArg(ckKernel, 2, sizeof(cl_uint), (void*)&numElements);
A simple kernel Modifications on the HOST code
![Page 55: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/55.jpg)
Array reversal performance
C serial version vs OpenCL
OpenCL example – array reversal
![Page 56: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/56.jpg)
Array reversal performance
C serial version vs OpenCL
OpenCL example – array reversal
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
C Serial version - i7 @3.9GHz OpenCL - Tesla c1060
Execu
tio
n T
ime (
sec)
About 2.3x speedup.
![Page 57: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/57.jpg)
OpenCL example – array reversal
Why?
SIMPLE ARRAY INCREMENT CASE
![Page 58: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/58.jpg)
OpenCL example – array reversal
Glo
bal m
em
ory
Why?
SIMPLE ARRAY INCREMENT CASE
![Page 59: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/59.jpg)
OpenCL example – array reversal
16 word packet
16 word packet
16 word packet
16 word packet
.
.
.
Glo
bal m
em
ory
Lo
cal m
em
ory
Why?
SIMPLE ARRAY INCREMENT CASE
![Page 60: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/60.jpg)
OpenCL example – array reversal
16 word packet
16 word packet
16 word packet
16 word packet
.
.
.
Glo
bal m
em
ory
Lo
cal m
em
ory
Thread 1
Thread 2
.
.
.
Why?
SIMPLE ARRAY INCREMENT CASE
![Page 61: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/61.jpg)
Why?
SIMPLE ARRAY INCREMENT CASE
OpenCL example – array reversal
16 word packet
16 word packet
16 word packet
16 word packet
.
.
.
Glo
bal m
em
ory
Lo
cal m
em
ory
Thread 1
Thread 2
.
.
.
Back t
o G
lob
al m
em
ory
![Page 62: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/62.jpg)
Why?
ARRAY REVERSAL CASE
OpenCL example – array reversal G
lob
al m
em
ory
Thread 10
Back t
o G
lob
al m
em
ory
Glo
bal m
em
ory
Thread 538
Lo
cal m
em
ory
Lo
cal m
em
ory
…
Waste of memory operations
![Page 63: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/63.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
![Page 64: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/64.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array
![Page 65: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/65.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array Local memory
![Page 66: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/66.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array Local memory
One workgroup
![Page 67: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/67.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array Local memory Output array
One workgroup
![Page 68: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/68.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array Local memory Output array
One workgroup
![Page 69: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/69.jpg)
OpenCL example – array reversal
Solution: Bring data in local memory in order to achieve coalescence
Input array Local memory Output array
![Page 70: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/70.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
An improved kernel
![Page 71: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/71.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
{
const int lid = get_local_id(0);
const int lsize = get_local_size(0);
An improved kernel
![Page 72: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/72.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
{
const int lid = get_local_id(0);
const int lsize = get_local_size(0);
shared[lsize-lid-1]=in[get_global_id(0)];
An improved kernel
![Page 73: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/73.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
{
const int lid = get_local_id(0);
const int lsize = get_local_size(0);
shared[lsize-lid-1]=in[get_global_id(0)];
barrier(CLK_LOCAL_MEM_FENCE);
An improved kernel
![Page 74: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/74.jpg)
OpenCL example – array reversal
Input array Local memory Output array
Wait untill ALL threads have finished fetching data to local memory
An improved kernel
![Page 75: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/75.jpg)
OpenCL example – array reversal
Input array Local memory Output array
Wait untill ALL threads have finished fetching data to local memory
An improved kernel
![Page 76: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/76.jpg)
OpenCL example – array reversal
Input array Local memory Output array
An improved kernel
![Page 77: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/77.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
{
const int lid = get_local_id(0);
const int lsize = get_local_size(0);
shared[lsize-lid-1]=in[get_global_id(0)];
barrier(CLK_LOCAL_MEM_FENCE);
An improved kernel
![Page 78: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/78.jpg)
OpenCL example – array reversal
__kernel void
ArrayRev(__global const float* in,
__global float *out,
__local float *shared,
int iNumElements)
{
const int lid = get_local_id(0);
const int lsize = get_local_size(0);
shared[lsize-lid-1]=in[get_global_id(0)];
barrier(CLK_LOCAL_MEM_FENCE);
int oGID = iNumElements - (get_group_id(0)+1)*lsize+lid;
if (oGID<0) return;
out[oGID] = shared[lid];
}
An improved kernel
![Page 79: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/79.jpg)
OpenCL example – array reversal
An improved kernel Modifications on the HOST code
• Define the shared array size (local to each workgroup):
size_t shared_size=szLocalWorkSize*sizeof(cl_float);
Number of work items in each work group
![Page 80: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/80.jpg)
OpenCL example – array reversal
An improved kernel Modifications on the HOST code
• Set the kernel arguments error = clSetKernelArg(ckKernel, 0, sizeof(cl_mem), (void*)&cmDevSrcA);
error |= clSetKernelArg(ckKernel, 1, sizeof(cl_mem), (void*)&cmDevDstB );
error |= clSetKernelArg(ckKernel, 2, shared_size, NULL);
error |= clSetKernelArg(ckKernel, 3, sizeof(cl_uint), (void*)&numElements);
• Define the shared array size (local to each workgroup):
size_t shared_size=szLocalWorkSize*sizeof(cl_float);
![Page 81: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/81.jpg)
Array reversal performance – improved
kernel
C serial version vs OpenCL
OpenCL example – array reversal
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
C Serial version - i7 @3.9GHz OpenCL - Tesla c1060
Execu
tio
n T
ime (
sec)
About 7.4x speedup.
![Page 82: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/82.jpg)
Suggested internet resources
![Page 83: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/83.jpg)
Suggested internet resources
OpenCL official specification: http://www.khronos.org/opencl/ SDKs / Drivers / Tutorials /Tools AMD: http://developer.amd.com/zones/openclzone/pages/default.aspx Intel: http://software.intel.com/en-us/articles/opencl-sdk/ nVidia: http://developer.nvidia.com/opencl Apple: http://developer.apple.com/library/mac/#documentation/Performance/ Conceptual/OpenCL_MacProgGuide/Introduction/Introduction.html IBM Power architecture: http://www.alphaworks.ibm.com/tech/opencl
![Page 84: Introduction to OpenCL programming · Introduction to OpenCL programming Nasos Iliopoulos George Mason University, resident at Computational Multiphysics Systems Lab. Center of Computational](https://reader030.vdocument.in/reader030/viewer/2022040123/5e193a91dd68a86f966720ca/html5/thumbnails/84.jpg)
Questions?