Introduction to Debugging in SDSoC

The SDSoC™ environment includes an Eclipse-based integrated development environment (IDE) for implementing heterogeneous embedded systems. SDSoC supports Arm® Cortex™-based applications using the Zynq®-7000 SoC and Zynq® UltraScale+™ MPSoC devices, as well as MicroBlaze™ processor-based applications on all Xilinx® SoCs and FPGAs.

This user guide introduces the debugging capabilities of the SDSoC environment, and provides you with detailed instructions on how to analyze any failure encountered within the SDSoC flow.

Note: This user guide does not cover performance issues. If no tool problems are encountered, and the behavior of the design is deemed functionally correct, you can look for answers in the SDSoC Profiling and Optimization Guide to examine whether the performance of the design can be further improved.

SDSoC Environment Overview

The SDSoC environment includes a system compiler that transforms C/C++ programs into complete hardware/software systems with select functions compiled into the programmable logic (PL). The SDSoC system compiler analyzes a program to determine the data flow between software and hardware functions, and generates an application-specific system-on-chip (SoC) to realize the program.

To achieve high performance, each hardware function runs as an independent thread; the system compiler generates hardware and software components that ensure synchronization between hardware and software threads, while enabling pipelined computation and communication. Application code can involve many hardware functions, multiple instances of a specific hardware function, and calls to a hardware function from different parts of the program.

The SDx integrated development environment (IDE) supports software development workflows including profiling, compilation, linking, system performance analysis, and debugging. It also provides a fast performance estimation capability to enable exploration of the hardware/software interface before committing to a full hardware compile.

The SDSoC system compiler targets a base platform and invokes the Vivado® High-Level Synthesis (HLS) tool to compile synthesizable C/C++ functions into programmable logic. The system compiler then generates a complete hardware system, including DMAs, interconnects, hardware buffers, other IP, and the FPGA bitstream by invoking the Vivado Design Suite tools. To ensure that all hardware function calls preserve their original behavior, the SDSoC system compiler generates system-specific software stubs and configuration data. The program includes the function calls to drivers required to use the generated IP blocks. Application and generated software is compiled and linked using a standard GNU toolchain.

By generating complete applications from a single source, the system compiler lets you iterate over design and architecture changes by refactoring at the program level, which reduces the time needed to achieve working programs running on the target platform.


The following terms are widely used while designing in the SDSoC environment. The terms and their definitions are provided below.

Portions of the application code that have been implemented in the hardware in the FPGA general interconnect. These are also called hardware functions.
Data Mover
The data mover transfers data between accelerators, and between the processing system (PS) and accelerators. The SDSoC environment can generate various types of data movers based on the properties and size of the data being transferred.
Pipelining is a technique to increase instruction-level parallelism in the hardware implementation of an algorithm by overlapping independent stages of operations or functions. The data dependence in the original software implementation is preserved for functional equivalence, but the required circuit is divided into a chain of independent stages. All stages in the chain run in parallel on the same clock cycle. The only difference is the source of data for each stage. Each stage in the computation receives its data values from the result computed by the preceding stage during the previous clock cycle.
Special directives that can be inserted into the source code to guide the system compiler. In the SDSoC environment, you control the system generation process by structuring hardware functions and calls to hardware functions in a way that balances communication and computation, and by inserting pragmas into your source code to guide the system compiler.
Processors in the context of the SDSoC environment mean a soft processor such as a MicroBlaze processor, or a hard processor such as the Arm processors on Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs.
System Port
A system port connects a data mover to the PS. It can be an ACP, AFI (corresponding to high-performance ports), MIG (corresponding to a PL-based DDR memory controller), or a stream port on the Zynq.

Elements of SDSoC

The SDSoC environment includes the following features:

  • The sds++ system compiler, which generates complete hardware/software systems. The sds++ system compiler employs underlying features from the Vivado Design Suite System Edition, including the Vivado High-Level Synthesis (HLS) tool, Vivado IP integrator, IP libraries for data movement and interconnect, and tools for RTL synthesis, placement, routing, and bitstream generation.
  • An Eclipse-based integrated development environment (IDE) to create and manage application projects and workflows.
  • A system performance estimation capability to explore different scenarios for the hardware/software interface.

The SDSoC environment also inherits many of the tools in the Xilinx Software Development Kit (SDK), including GNU toolchains for Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs, standard libraries (for example, glibc), and the Target Communication Framework (TCF) for communicating with embedded processor targets. It also features a performance analysis perspective within the Eclipse/CDT-based IDE.

The sds++ system compiler generates an application-specific system-on-chip for a targeted platform. The environment includes a number of standard base platforms for application development, and other platforms can be developed by third-party partners, or by SDSoC design teams. The SDSoC Environment Platform Development Guide describes how to create a hardware platform design in the Vivado Design Suite, configure platform interfaces, and define the corresponding software runtime environment to build a platform for use in the SDx™ IDE.

The SDx™ IDE lets you customize a target platform with application-specific hardware accelerators, and data motion networks connecting accelerators to the platform. A simplified Zynq and DDR configuration with memory access ports and hardware accelerators is shown below.

Figure: Simplified Zynq + DDR Diagram Showing Memory Access Ports and Memories

Execution Model of an SDSoC Application

The execution model for an SDSoC environment application can be understood in terms of the normal execution of a C++ program running on the target CPU after the platform has booted. It is useful to understand how a C++ binary executable interfaces to hardware.

The set of declared hardware functions within a program is compiled into hardware accelerators that are accessed with the standard C runtime through calls into these functions. Each hardware function call in effect invokes the accelerator as a task and each of the arguments to the function is transferred between the CPU and the accelerator, accessible by the program after accelerator task completion. Data transfers between memory and accelerators are accomplished through data movers, such as a DMA engine, automatically inserted into the system by the sds++ system compiler taking into account user data mover pragmas such as zero_copy.

Figure: Architecture of an SDSoC System

To ensure program correctness, the system compiler intercepts each call to a hardware function, and replaces it with a call to a generated stub function that has an identical signature but with a derived name. The stub function orchestrates all data movement and accelerator operation, synchronizing software and accelerator hardware at the exit of the hardware function call. Within the stub, all accelerator and data mover control is realized through a set of send and receive APIs provided by the sds_lib library.

When program dataflow between hardware function calls involves array arguments that are not accessed after the function calls have been invoked within the program (other than destructors or free() calls), and when the hardware accelerators can be connected using streams, the system compiler transfers data from one hardware accelerator to the next through direct hardware stream connections, rather than implementing a round trip to and from memory. This optimization can result in significant performance gains and reduction in hardware resources.

The SDSoC program execution model includes the following steps:
  1. Initialization of the sds_lib library occurs during the program constructor before entering main().
  2. Within a program, every call to a hardware function is intercepted by a function call into a stub function with the same function signature (other than name) as the original function. Within the stub function, the following steps occur:
    1. A synchronous accelerator task control command is sent to the hardware.
    2. For each argument to the hardware function, an asynchronous data transfer request is sent to the appropriate data mover, with an associated wait() handle. A non-void return value is treated as an implicit output scalar argument.
    3. A barrier wait() is issued for each transfer request. If a data transfer between accelerators is implemented as a direct hardware stream, the barrier wait() for this transfer occurs in the stub function for the last in the chain of accelerator functions for this argument.
  3. Clean up of the sds_lib library occurs during the program destructor, upon exiting main().
TIP: Steps 2a–2c ensure that program correctness is preserved at the entrance and exit of accelerator pipelines while enabling concurrent execution within the pipelines.

Sometimes, the programmer has insight of the potential concurrent execution of accelerator tasks that cannot be automatically inferred by the system compiler. In this case, the sds++ system compiler supports a #pragma SDS async(ID) that can be inserted immediately preceding a call to a hardware function. This pragma instructs the compiler to generate a stub function without any barrier wait() calls for data transfers. As a result, after issuing all data transfer requests, control returns to the program, enabling concurrent execution of the program while the accelerator is running. In this case, it is your responsibility to insert a #pragma SDS wait(ID) within the program at appropriate synchronization points, which are resolved into sds_wait(ID) API calls to correctly synchronize hardware accelerators, their implicit data movers, and the CPU.

IMPORTANT: Every async(ID) pragma requires a matching wait(ID) pragma.

SDSoC Build Process

The SDSoC build process uses a standard compilation and linking process. Similar to g++, the sds++ system compiler invokes sub-processes to accomplish compilation and linking.

As shown in the following figure, compilation is extended not only to object code that runs on the CPU, but it also includes compilation and linking of hardware functions into IP blocks using the Vivado High-Level Synthesis (HLS) tool, and creating standard object files (.o) using the target CPU toolchain. System linking consists of program analysis of caller/callee relationships for all hardware functions, and the generation of an application-specific hardware/software network to implement every hardware function call. The sds++ system compiler invokes all necessary tools, including Vivado HLS (function compiler), the Vivado Design Suite to implement the generated hardware system, and the Arm compiler and sds++ linker to create the application binaries that run on the CPU invoking the accelerator (stubs) for each hardware function by outputting a complete bootable system for an SD card.

Figure: SDSoC Build Process

The compilation process includes the following tasks:

  1. Analyzing the code and running a compilation for the main application on the Arm core, as well as a separate compilation for each of the hardware accelerators.
  2. Compiling the application code through standard GNU Arm compilation tools with an object (.o) file produced as final output.
  3. Running the hardware accelerated functions through the HLS tool to start the process of custom hardware creation with an object (.o) file as output.

After compilation, the linking process includes the following tasks:

  1. Analyzing the data movement through the design and modifying the hardware platform to accept the accelerators.
  2. Implementing the hardware accelerators into the programmable logic (PL) region using the Vivado Design Suite to run synthesis and implementation, and generate the bitstream for the device.
  3. Updating the software images with hardware access APIs to call the hardware functions from the embedded processor application.
  4. Producing an integrated SD card image that can boot the board with the application in an Executable and Linkable Format (ELF) file.

SDSoC Debug Flow Overview

The systems produced by the SDSoC environment are high-performance, complex, and composed of hardware and software components. It can be difficult to understand the execution of applications in such systems with portions of software running in a processor, hardware accelerators executing in the programmable fabric, and many simultaneous data transfers between them. The SDSoC environment lets you create and debug projects using the Xilinx System Debugger (XSDB), and provides sophisticated hardware/software event tracing, offering an integrated timeline view of data transfers and accelerator tasks, including driver software setup and execution in hardware. Outside the SDx IDE, you can use command line or scripting options to debug your projects.

The SDSoC development environment lets you target the build process of the compilation, linking commands to either a system emulation target, or to the hardware target of the specified platform. As an alternative to building a complete system, you can create a system emulation model that consists of the target platform and application binaries. For the emulation target, the sds++ system compiler creates a simulation model using the source files for the accelerator functions.

System emulation is one of the most capable debug features in the SDSoC environment. It can help debug functional issues and determine why an application is hanging. This feature is only available on Xilinx base platforms, including the ZC702, ZC706, ZCU102, ZCU104, ZCU106, and ZedBoard base platforms.

After you identify the hardware functions, you can use system emulation to quickly compile the logic, and verify the entire system. This provides a Quick Emulator (QEMU)-based emulator that runs the cross-compiled Arm code, interacting with the hardware accelerator being run in the Vivado simulator. The RTL simulator can display waveforms, or it can be run without waveforms for faster simulation. The emulator can be run within the SDx IDE or on the command line (sdsoc_emulator), providing accurate visibility of the final hardware implementation without the need to compile the system into a bitstream, and program the device on the board.

Figure: System Emulation Flow

When targeting the hardware platform, you can also enable hardware and software event tracing to analyze the execution of events, and identify any issues (see Hardware/Software Event Tracing). If there are problems with respect to the hardware design itself, you can use hardware debug from the Vivado Lab Edition tools by inserting debug cores in the hardware functions implemented in the SDSoC environment. The following flow chart shows a typical hardware build and debug process.

Figure: Hardware Build and Debug Flow

Xilinx base platforms support both system emulation and hardware target builds. Custom and third-party platforms, without emulation capabilities, support only the hardware build and debug flow.

System Emulation

On Xilinx base platforms, you can use system emulation to debug register transfer level (RTL) transactions in the entire system (PS and PL). Running your application on the SDSoC emulator (sdsoc_emulator) gives you visibility of data transfers with a debugger. You can debug system hangs and inspect associated data transfers in the simulation waveform view, which gives you visibility into signals on the hardware blocks associated with the data transfer.

Hardware Execution Flow

During hardware execution, you can use the actual hardware platform to run the accelerated application. You can create a debug configuration of the hardware that includes special debug logic in the accelerators, such as the System Integrated Logic Analyzer (System ILA), Virtual Input/Output (VIO) debug cores, and AXI performance monitors. The SDSoC environment provides specific hardware debug capabilities using the Vivado hardware manager, with waveform analysis, kernel activity reports, and memory access analysis to provide visibility into these critical hardware issues.

In-system debugging lets you debug your design in real time, on your target hardware. This is an essential step in design completion. Invariably, there are situations that are extremely hard to replicate in a simulator. Therefore, there is a need to debug the problem in the running hardware. In this step, you place debug cores into your design to provide you the ability to observe and control the design. After the debugging process is complete, you can remove the debug cores to increase performance and reduce resource usage of the device.

The SDx IDE and command line options provide ways to instrument your design for debugging. The --dk compiler switch lets you add ILA debug cores to the interfaces of your hardware function. To debug C-callable IP that are used in your application code, you must have instantiated the required debug cores into the RTL code of the IP prior to packaging it as a C-callable IP.

IMPORTANT: Debugging the hardware function on the SDSoC platform hardware requires additional logic to be incorporated into the overall hardware model. This means that if hardware debugging is enabled, there is some impact on resource utilization of the Xilinx device, as well as some impact on the performance of the hardware function.

Connecting to the Hardware

The board connection requirements are slightly different depending on the operating system: standalone, FreeRTOS, or Linux.
  • For standalone and FreeRTOS, you must download the ELF file to the board using the USB/JTAG interface. Trace data is read out over the same USB/JTAG interface as well.
  • For Linux, the SDx environment assumes the OS boots from the SD card. It then copies the .elf file and runs it using the TCP/TCF agent running in Linux over the Ethernet connection between the board and host PC. The trace data is read out over the USB/JTAG interface. Both USB/JTAG and TCP/TCF agent interfaces are needed for tracing Linux applications.
The figure below shows the connections required.

Figure: Connections Required When Using Trace with Different Operating Systems

Event Tracing

The event tracing feature provides a detailed view of what is happening in the system during the execution of an application. Trace events are produced and gathered into a timeline view, giving you a perspective of the running application. This detailed view can help you understand the performance of your application given the workload, hardware/software partitioning, and system design choices. This view enables event tracing of software running on the processor, as well as hardware accelerators and data transfer links in the system. Such information helps you to identify problems, optimize the design, and improve system implementation.

Tracing an application produces a log that records information about system execution. Compared to event logging, event tracing shows the correlation between events for the duration of the event, rather than an instantaneous event at a particular time. The goal of tracing is to help debug execution by observing what happened when, and how long events took. This is best used to analyze performance and get an indication of whether there is an application hang.