OpenCL Kernels
This OpenCL kernel discussion is based on
the information provided in the C/C++ Kernels topic. The
same programming techniques for accelerating the performance of a kernel apply to both C/C++
and OpenCL kernels. However, the OpenCL kernel uses the __attribute syntax in
place of pragmas. For details of the available attributes, refer to OpenCL Attributes.
The following code examples show some of the elements of an OpenCL kernel for the Vitis application acceleration development flow. This is not intended to be a primer on OpenCL or kernel development, but to merely highlight some of the key difference between OpenCL and C/C++ kernels.
Kernel Signature
In C/C++ kernels, the kernel is identified on the command line of the Vitis compiler by the use of thev++
--kernel option. However, in OpenCL code, the
__kernel keyword identifies a kernel in the code. You can have multiple
kernels defined in a single .cl file, and the Vitis compiler will compile all of the kernels, unless you specify the
--kernel option to identify which kernel to
compile.__kernel __attribute__ ((reqd_work_group_size(1, 1, 1)))
void apply_watermark(__global const TYPE * __restrict input,
__global TYPE * __restrict output, int width, int height) {
{
...
}apply_watermark, can be found in the Global Memory Two Banks (CL) example in the Vitis Examples repository on GitHub.In the example
above, you can see the watermark kernel has two pointer type arguments:
input and output, and has two scalar type int arguments:
width and height.
In C/C++ kernels these
arguments would need to be identified with the HLS INTERFACE pragmas.
However, in the OpenCL kernel the Vitis compiler, and
Vivado HLS recognize the kernel arguments, and
compile them as needed: pointer arguments into m_axi interfaces, and scalar
arguments into s_axilite interfaces.
Kernel Optimizations
Because the kernel is running in programmable logic on the target platform, optimizing your
task to the environment is an important element of application design. Most of the
optimization techniques discussed in C/C++ Kernels can be applied to
OpenCL kernels. Instead of applying the HLS pragmas
used for C/C++ kernels, you will use the __attribute__ keyword described in
OpenCL Attributes. Following is an example:
// Process the whole image
__attribute__((xcl_pipeline_loop))
image_traverse: for (uint idx = 0, x = 0 , y = 0 ; idx < size ; ++idx, x+= DATA_SIZE)
{
...
}
The example above specifies that the for loop,
image_traverse, should be pipelined to improve the performance of the
kernel. The target II in this case is 1. Refer to xcl_pipeline_loop for more
information.
In the following code example, the watermark function uses the
opencl_unroll_hint attribute to let the Vitis compiler unroll the loop to reduce latency and improve performance.
However, in this case the __attribute__ is only a suggestion that the
compiler can ignore if needed. Refer to opencl_unroll_hint for details.
//Unrolling below loop to process all 16 pixels concurrently
__attribute__((opencl_unroll_hint))
watermark: for ( int i = 0 ; i < DATA_SIZE ; i++)
{
...
}
For more information, you can review the OpenCL Attributes topics to see what specific optimizations are supported for OpenCL kernels, and review the C/C++ Kernels content to see how these optimizations can be applied in your kernel design.