Migrating the Host Application
In this section, you will review a simple SDSoC™ program with both the main() and accelerated
functions to identify the elements that must be changed. To begin the process of
migrating an application and hardware functions to the Vitis environment platforms and tools, examine your main function and the
hardware function code. The code presented here is the mmult example
application.
The following code snippet is an example main() function in
the original development application project.
#include <stdlib.h>
#include <iostream>
#include "mmult.h"
#include "sds_lib.h"
#define NUM_TESTS 5
void printMatrix(int *mat, int col, int row) {
for (int i = 0; i < col; i++) {
for (int j = 0; j < row; j++) {
std::cout << mat[i*row+j] << "\t";
}
std::cout << std::endl;
}
std::cout << std::endl;
}
int main() {
int col = BUFFER_SIZE;
int row = BUFFER_SIZE;
int *matA = (int*)sds_alloc(col*row*sizeof(int));
int *matB = (int*)sds_alloc(col*row*sizeof(int));
int *matC = (int*)sds_alloc(col*row*sizeof(int));
std::cout << "Mat A" << std::endl;
printMatrix(matA, col, row);
std::cout << "Mat B" << std::endl;
printMatrix(matB, col, row);
//Run the hardware function multiple times
for(int i = 0; i < NUM_TESTS; i++) {
std::cout << "Test #: " << i << std::endl;
// Populate matA and matB
srand(time(NULL));
for (int i = 0; i < col*row; i++) {
matA[i] = rand()%10;
matB[i] = rand()%10;
}
std::cout << "MatA * MatB" << std::endl;
mmult(matA, matB, matC, col, row);
}
printMatrix(matC, col, row);
return 0;
The code allocates memory for three different two-dimensional matrices
stored as one-dimensional arrays, populates matA and
matB with random numbers, and multiplies matA and matB to compute
matC. The results are printed to the screen and the
test is run ten times.
When moving to the Vitis environment,
several of the tasks that are implicitly handled by the sds++ compiler and runtime needs to instead be explicitly managed by the
application developer.
Updating the Required #include Files
The following sections discuss the specific code changes in the
main() function.
The following changes will need to be made.
#include <stdlib.h>
#include <iostream>
#include "mmult.h"
//#include "sds_lib.h"
#include <fstream>
#include <vector>
#include <ctime>
In this example, main() function is compiled by the Arm® core cross compiler. Comment out the sds_lib include line, as you are no longer relying on the
sds_alloc() function for memory allocation.
#define CL_HPP_CL_1_2_DEFAULT_BUILD
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1
#include <CL/cl2.hpp>
In this section, you #define pre-processor
macros to specify the version of the OpenCL API to
use for the application. The default settings specify the OpenCL API 2.0 framework, but the Xilinx tools support the OpenCL API
1.2 release. For more information on these pre-processor macros, refer to "OpenCL C++
Bindings" at https://github.khronos.org/OpenCL-CLHPP.
The OpenCL API provides support for both C
and C++ languages; however, you can set up the environment to use C++ over the default
C. To use the OpenCL C++ bindings, as shown in this
example code, you must #include the cl2.hpp header file.
The following mmult example code snippet is compiled
separately.
#include "mmult.h"
void mmult(int A[BUFFER_SIZE*BUFFER_SIZE], int B[BUFFER_SIZE*BUFFER_SIZE], int C[BUFFER_SIZE*BUFFER_SIZE], int col, int row) {
int matA[BUFFER_SIZE*BUFFER_SIZE];
int matB[BUFFER_SIZE*BUFFER_SIZE];
readA: for(int i = 0; i < col*row; i++) {
#pragma HLS PIPELINE II=1
matA[i] = A[i];
}
readB: for(int i = 0; i < col*row; i++) {
#pragma HLS PIPELINE II=1
matB[i] = B[i];
}
for (int i = 0; i < col; i++) {
#pragma HLS PIPELINE II=1
for (int j = 0; j < row; j++) {
int tmp = 0;
for (int k = 0; k < row; k++) {
tmp += matA[k+i*col] * matB[j+k*col];
}
//C[i+j*col] = tmp;
C[i*row+j] = tmp;
}
}
}
Loading the Main Function
To initialize the OpenCL API environment, the software application needs to load the FPGA binary file (.xclbin). This example uses argc/argv to pass the name of this file through the command line argument of the application.
Given these changes, the application is run as follows.
host.exe ./binary_container_1.xclbin
Where:
- host.exe
- Compiled executable for the Arm core.
- binary_container_1.xclbin
- FPGA binary file generated by the Vitis compiler.
Next, add some error checking to ensure the required command-line arguments were specified.
// Check for valid arguments
if (argc != 2) {
printf("Usage: %s binary_container_1.xclbin\n", argv[0]);
exit (EXIT_FAILURE);
}
// Get xclbin name
char* xclbinFilename = argv[1];
The variable declarations for the input and output matrices also changes, as the allocation of memory will be separately handled later in the code by creating OpenCL buffers. For now, you will simply define the three vectors needed to hold the matrix data.
Using the OpenCL API
The primary difference between the SDSoC development environment and the Vitis core development kit is the use of the OpenCL APIs to manage interactions between the main function and the hardware accelerated kernels. This section of the code is marked by the following opening and closing comments.
//OPENCL HOST CODE AREA STARTS
//OPENCL HOST CODE AREA ENDS
You need to modify the host code and use the OpenCL C++ API to direct XRT to coordinate execution of the kernel with the host application. These steps are coded in the following order:
- Setup
- Specify the platform.
- Select the OpenCL device to run the kernel.
- Create an OpenCL context.
- Create a command queue.
- Create an OpenCL program.
- Create a kernel object for a hardware kernel.
- Create memory buffers for the OpenCL device.
- Execution
- Define arguments for the kernel.
- Transfer data from the host CPU to the kernel.
- Run the kernel.
- Return data from the kernel to the host application.
The following section discusses each of these steps and required code changes in detail.
- For more information on modifying the main application for use in the Vitis environment, refer to Methodology for Accelerating Applications with the Vitis Software Platform.
- For more information on OpenCL APIs in general, and specific OpenCL API commands, refer to https://www.khronos.org/registry/OpenCL/.
The following code identifies the platform and the device.
// Get Platform
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
cl::Platform platform = platforms[0];
// Get Device
std::vector<cl::Device> devices;
cl::Device device;
platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
device = devices[0];
The platform is the Xilinx-specific implementation of OpenCL framework, including XRT and the accelerator hardware. The device is the hardware that will run the OpenCL kernel.
With a device selected, you must create a context, which is used by the runtime to manage objects, such as command-queues, memory, programs, and kernels on one or more devices. You must also create the command-queue which executes the commands, either in the order presented or out-of-order to parallelize different requests and improve throughput. This is done as follows.
// Create Context
cl::Context context(device);
// Create Command Queue
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
As described above, the host and kernel code are compiled separately to create two different outputs. The kernel is compiled into a xclbin file using the Vitis compiler. The host application must identify and load the xclbin file as an OpenCL program object for XRT. You must create a program in a context, and identify the kernels in the program. These steps are reflected in the following code.
// Load xclbin
std::cout << "Loading: " << xclbinFilename << "'\n";
std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
bin_file.seekg (0, bin_file.end);
unsigned nb = bin_file.tellg();
bin_file.seekg (0, bin_file.beg);
char *buf = new char [nb];
bin_file.read(buf, nb);
// Creating Program from Binary File
cl::Program::Binaries bins;
bins.push_back({buf,nb});
cl::Program program(context, devices, bins);
// Create Kernel object(s)
cl::Kernel kernel_mmult(program,"mmult");
In the above example, the kernel_mmult object
identifies a kernel called mmult specified in the program object (xclbin). In a later
section, you will look at the specific steps for migrating the hardware function from
the SDSoC environment to the Vitis environment.
Before executing the kernel, you must transfer data from the host application
to the device. The SDSoC environment supports two
types of transfers, data_copy and zero_copy. The Vitis
environment only supports zero_copy. The OpenCL buffers are the conduit through which data is
communicated from the host application to the kernels. To transfer data, the application
must first declare OpenCL buffer objects, and then
use API calls such as enqueueWriteBuffer() and enqueueReadBuffer() to perform the actual transfer. XRT
copies data from user space memory to a physically contiguous region of OS kernel space
memory that the hardware functions accesses directly through an AXI bus interface.
Start by defining memory buffers for the kernel and specifying the kernel arguments as follows.
// Create Buffers
cl::Buffer bufMatA = cl::Buffer(context, CL_MEM_WRITE_ONLY, col*row*sizeof(int), NULL, NULL);
cl::Buffer bufMatB = cl::Buffer(context, CL_MEM_WRITE_ONLY, col*row*sizeof(int), NULL, NULL);
cl::Buffer bufMatC = cl::Buffer(context, CL_MEM_READ_ONLY, col*row*sizeof(int), NULL, NULL);
// Assign Kernel arguments
int narg = 0;
kernel_mmult.setArg(narg++, bufMatA);
kernel_mmult.setArg(narg++, bufMatB);
kernel_mmult.setArg(narg++, bufMatC);
kernel_mmult.setArg(narg++, col);
kernel_mmult.setArg(narg++, row);
The OpenCL API calls create data buffers in
the specified context, defining the read/write abilities of the buffer. Then, these
buffers are specified as arguments for the hardware kernel, along with any scalar values
that are directly passed, such as col and row in the
example above.
The next section of code in the main() function is left unchanged. This
implements the primary for loop to perform the specified number of tests (NUM_TESTS),
randomly populates the input matrices (matA and matB), and then outputs the matrix values using the printMatrix function. From this point, the main() function
runs the matrix multiplication (mmult()) in the
hardware accelerator.
In the SDSoC environment, the hardware
function is directly called. The hardware function call runs the accelerator as a task,
and each of the arguments to the function is transferred between the Arm processor and the PL region. Data transfers are
accomplished through data movers, such as a DMA engine, automatically inserted into the
system by the sds++ compiler.
In the Vitis environment, you must enqueue the transfer of data from the host to the local memory, enqueue the kernel to be run, and then enqueue the transfer of data from the kernel back to the host, or on to another kernel as the program requires. In this simple example, the data is simply returned to the host.
In the following code snippet, the input matrices are transferred from
the host to the device memory, the kernel is run, and the output matrix is transferred
back to the host application. The OpenCL API
enqueue commands are non-blocking, which means that they return before the actual
command is completed. Calling q.finish() blocks
furthers execution until all commands in the command queue have completed. This ensures
the host waits for the data to be transferred back from the kernel.
// Enqueue Buffers
q.enqueueWriteBuffer(bufMatA, CL_TRUE, 0, col*row*sizeof(int), matA.data(), NULL, NULL);
q.enqueueWriteBuffer(bufMatB, CL_TRUE, 0, col*row*sizeof(int), matB.data(), NULL, NULL);
// Launch Kernel
q.enqueueTask(kernel_mmult);
// Read Data Back from Kernel
q.enqueueReadBuffer(bufMatC, CL_TRUE, 0, col*row*sizeof(int), matC.data(), NULL, NULL);
q.finish();
After this the output matrix is printed to validate the results of the matrix multiplication. When NUM_TESTS have been run, the main function returns.
You can see that it is easy to follow the steps required to migrate your main application from the SDSoC environment to the Vitis environment. This is primarily driven by XRT and the OpenCL APIs that manages the interactions between the main() function and the kernels.