opencl programming 101
DESCRIPTION
OpenCL Programing Part 1 This lecture reviews Host programing according to OpenCL, a Khronos standard for parallel programmingTRANSCRIPT
www.dsp-ip.comFast Forward Your Development
OpenCL
Host Programming
Fast Forward Your Development
OPENCL™ EXECUTION MODEL
Fast Forward Your Development 3
OpenCL™ Execution Model
•Kernel
▫ Basic unit of executable code - similar to a C function
▫ Data-parallel or task-parallel
▫ H.264Encode is not a kernel
▫ Kernel should be a small separate function (SAD)
•Program
▫ Collection of kernels and other functions
▫ Analogous to a dynamic library
•Applications queue kernel execution instances
▫ Queued in-order
▫ Executed in-order or out-of-order
Fast Forward Your Development 4
Data-Parallelism in OpenCL™•Define N-dimensional computation domain (N = 1, 2 or 3)
▫ Each independent element of execution in N-D
domain is called a work-item
▫ The N-D domain defines the total number of work-
items that execute in parallel
1024 x 1024 image:
problem dimensions:
1024 x 1024 = 1 kernel
execution per pixel:
1,048,576 total executions
void
scalar_mul(int n,
const float *a,
const float *b,
float *result)
{
int i;
for (i=0; i<n; i++)
result[i] = a[i] * b[i];
}
Scalar
kernel void
dp_mul(global const float *a,
global const float *b,
global float *result)
{
int id = get_global_id(0);
result[id] = a[id] * b[id];
}
// execute dp_mul over “n” work-items
Data-Parallel
Fast Forward Your Development 5
Compiling Kernels• Create a program
▫ Input: String (source code) or precompiled binary
▫ Analogous to a dynamic library: A collection of kernels
• Compile the program
▫ Specify the devices for which kernels should be compiled
▫ Pass in compiler flags
▫ Check for compilation/build errors
• Create the kernels
▫ Returns a kernel object used to hold arguments for a given execution
Fast Forward Your Development
EX-1:OPENCL-”HELLO WORLD”
Fast Forward Your Development
Fast Forward Your Development
BASIC Program structure
Include
Get Platform Info
Create Context
Load & compile program
Create Queue
Load and Run Kernel
8
Fast Forward Your Development
Includes
9
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <SDKFile.hpp>
#include <SDKCommon.hpp>
#include <SDKApplication.hpp>
#include <CL/cl.hpp>
• Pay attention to include ALL OpenCL include files
Fast Forward Your Development
GetPlatformInfo
10
err = cl::Platform::get(&platforms);
if(err != CL_SUCCESS)
{ std::cerr << "Platform::get() failed (" << err << ")" << std::endl;
return SDK_FAILURE;
}
std::vector<cl::Platform>::iterator i;
if(platforms.size() > 0)
{ for(i = platforms.begin(); i != platforms.end(); ++i)
{
if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced
Micro Devices, Inc."))
{ break;}
}
}
• Detects the OpenCL “Devices” in the system:
▫ CPUs, GPUs & DSPs
Fast Forward Your Development
Create Context
11
cl_context_properties cps[3] =
{ CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 };
std::cout<<"Creating a context AMD platform\n";
cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err);
if (err != CL_SUCCESS)
{
std::cerr << "Context::Context() failed (" << err << ")\n";
return SDK_FAILURE;
}
• Context enables operation (Queue) and memory sharing between devices
Fast Forward Your Development
Load Program
12
std::cout<<"Loading and compiling CL source\n";
streamsdk::SDKFile file;
if (!file.open("HelloCL_Kernels.cl"))
{ std::cerr << "We couldn't load CL source code\n";
return SDK_FAILURE;}
cl::Program::Sources
sources(1, std::make_pair(file.source().data(),
file.source().size()));
cl::Program program = cl::Program(context, sources, &err);
if (err != CL_SUCCESS)
{ std::cerr << "Program::Program() failed (" << err << ")\n";
return SDK_FAILURE;
}
• Loads the kernel program (*.cl)
Fast Forward Your Development
Compile program
13
err = program.build(devices);
if (err != CL_SUCCESS) {
if(err == CL_BUILD_PROGRAM_FAILURE)
{ //Handle Error
std::cerr << "Program::build() failed (" << err << ")\n";
return SDK_FAILURE;
}
• Host program compiles Kernel program per device.
• Why compile in RT? - Like Java we don’t know the device till we run. We can decide in real-time based on load-balancing on which device to run
Fast Forward Your Development
Create Kernel with program
14
cl::Kernel kernel(program, "hello", &err);
if (err != CL_SUCCESS)
{
std::cerr << "Kernel::Kernel() failed (" << err << ")\n";
return SDK_FAILURE;
}
if (err != CL_SUCCESS) {
std::cerr << "Kernel::setArg() failed (" << err << ")\n";
return SDK_FAILURE;
}
• Associate Kernel object with our loaded and compiled program
Fast Forward Your Development
Create Queue per device & Run it
15
cl::CommandQueue queue(context, devices[0], 0, &err);
std::cout<<"Running CL program\n";
err = queue.enqueueNDRangeKernel(…..)
err = queue.finish();
if (err != CL_SUCCESS) {
std::cerr << "Event::wait() failed (" << err << ")\n";
}
• Loads the kernel program (*.cl). This does not have to happen immediately
• Attention: enqueue() is Asynchronous call meaning : function return does not imply Kernel was executed or even started to execute
Fast Forward Your Development
And that’s All Folks?
• Naaaa…..We still need to learn:
• Writing Kernel functions
• Synchronizing Kernel Functions
• Setting arguments to kernel functions
• Passing data from/to Host
16
Fast Forward Your Development
References
• “OpenCL Hello World” is an ATI OpenCL SDK programming exercise
• ATI OpenCL slides
17
Fast Forward Your Development
DSP-IP Contact information
Download slides at: www.dsp-ip.com
Course materials & lecture request
www.dsp-ip.comMail : [email protected]: +972-9-8850956,Fax : +972-50- 8962910
Yossi Cohen
+972-9-8850956