HLS Kernel Design Integration into SDAccel
The major flow described in the application-centric methodology of this guide is concerned with accelerator kernels being developed and integrated in the host application of a project in a top-down model. This implies, all source code is presented to SDAccel™ and sections of it are dedicated to being synthesized into accelerator modules. This flow calls the Vivado® High-Level Synthesis (HLS) tool to translate the function into hardware implementable accelerator code.
Alternatively, SDAccel provides a bottom-up flow, where HLS-based hardware kernels are created directly by importing from a Vivado HLS project. This allows you to perform optimizations and to validate kernel performance within the Vivado HLS project. When your kernel meets performance and resource requirements, the resulting Xilinx® object file (.xo) is handed off for inclusion into the SDx™ project. During hand-off, all kernel Vivado HLS optimization is maintained.
Figure: Vivado HLS Design Flow
The benefits of the bottom-up flow include:
- Designer can design, validate, and optimize the kernel prior to integration into the complete SDAccel project.
- Specific kernel optimizations are maintained for each kernel.
- Independent Vivado HLS and project locations allow separation of application and kernels.
- VHLS project can be used by multiple different projects, like a library instantiation.
- Allow teams to collaborate for increased productivity.
Creating SDAccel Kernels with Vivado HLS
Running Vivado HLS to generate kernels from C/C++ for SDAccel follows the regular Vivado HLS flow. However, since the kernel is supposed to operate as an accelerator in an SDAccel, the SDAccel kernel modeling guidelines need to be followed (see C/C++ modeling guide). Most importantly, the interfaces need to be modeled as AXI memory interfaces except for scalar parameters called by the value, which are mapped to an AXI4-Lite interface. This is illustrated in the following example:
void krnl_idct(const ap_int<512> *block,
const ap_uint<512> *q,
ap_int<512> *voutp,
int ignore_dc,
unsigned int blocks) {
#pragma HLS INTERFACE m_axi port=block offset=slave bundle=p0 depth=512
#pragma HLS INTERFACE s_axilite port=block bundle=control
#pragma HLS INTERFACE m_axi port=q offset=slave bundle=p1 depth=2
#pragma HLS INTERFACE s_axilite port=q bundle=control
#pragma HLS INTERFACE m_axi port=voutp offset=slave bundle=p2 depth=512
#pragma HLS INTERFACE s_axilite port=voutp bundle=control
#pragma HLS INTERFACE s_axilite port=ignore_dc bundle=control
#pragma HLS INTERFACE s_axilite port=blocks bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
ap-datatypes in the interfaces require
the use of ap-datatypes in the test bench for HLS. This might result in
slower C/C++ simulation speeds and mapping to native C/C++ should be considered. As most
host code is based on native data types, using them in the kernel interfaces is
recommended.For information on creating a new project, see the Vivado Design Suite User Guide: High-Level Synthesis (UG902). For SDAccel kernel projects, you must select the SDAccel Bottom Up Flow check box and specify the Clock Period and Part Selection as shown in the following figure.
Figure: New Vivado HLS Project
Choose the platform by clicking the Browse button to open the Device Selection Dialog and select the accelerator board from the Device list.
Figure: Device Selection
When completed, the iterative optimization process can resume until the best possible implementation results are achieved. For more information, see Vivado Design Suite User Guide: High-Level Synthesis (UG902).
After synthesis is completed for the optimized design, it needs to be exported to the SDAccel tool chain. The export command is available through the menu item.
Figure: Export RTL as IP
It is only necessary to confirm the XO file location, which names the generated XO-File that is imported in the next section back into SDAccel.
This completes the HLS synthesis part for SDAccel. In the following section, some required details for the command line flow are shown.
Typical Vivado HLS Script for SDAccel Synthesis
If you run your HLS synthesis through command line scripts, the following Tcl code is equivalent to the GUI flow shown before:
open_project guiProj
set_top krnl_idct
add_files src/krnl_idct.cpp
add_files -tb src/idct.cpp
open_solution "solution1"
set_part {xcu200-fsgd2104-2-e} -tool vivado
create_clock -period 10 -name default
config_sdx -optimization_level none -target xocc
config_schedule -effort medium -enable_dsp_full_reg
config_compile -name_max_length 256 -pipeline_loops 64
#source "./guiProj/solution1/directives.tcl"
csim_design
csynth_design
cosim_design
export_design -rtl verilog -format ip_catalog -xo \
/wrk/bugs/xoFlow/idct_hls/krnl_idct.xo
Incorporating Vivado HLS Kernel Projects into SDAccel
The Vivado HLS output is the kernel code
exported as a Xilinx object file (xo). This file can be seamlessly integrated into SDAccel by selecting the object file as input (see Adding Sources for more information). When SDAccel imports the xo
file, the kernel name is automatically extracted so the host code can start applying the
accelerator.
During SDAccel compilation, it is possible to create multiple compute units from the kernels, but the implementation remains the same as designed during the Vivado HLS run.
In SDAccel, the regular debug and analysis features are fully supported for this flow. It is possible to build the hardware emulation flow to test and debug in detail the implementation and tune the system build host code performance.
Known Limitations
- No software emulation support for projects with
xofiles (potential missing and duplicated header files). - GDB Kernel debug in hardware emulation flow is not supported.
- HLS analysis functionality is only available in the Vivado HLS project and not from SDAccel.