Accumulate Squared

The xFaccumulateSquare function adds the square of an image (src1) to the accumulator image (src2) and generates the accumulated result (dst).



The accumulated result is a separate argument in the function, instead of having src2 as the accumulated result. In this implementation, having a bi-directional accumulator is not possible as the function makes use of streams.

API Syntax

template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1> 
void xFaccumulateSquare (
xF::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1, 
xF::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2, 
xF::Mat<int DST_T, int ROWS, int COLS, int NPC> dst)

Parameter Descriptions

The following table describes the template and the function parameters.

Table 1. xFaccumulateSquare Function Parameter Descriptions
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
DST_T Output pixel type. Only 16-bit, unsigned, 1 channel is supported (XF_16UC1)
ROWS Maximum height of input and output image (must be a multiple of 8)
COLS Maximum width of input and output image (must be a multiple of 8)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1 pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
dst Output image

Resource Utilization

The following table summarizes the resource utilization in different configurations, generated using Vivado HLS 2017.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale HD (1080x1920) image.

Table 2. xFaccumulateSquare Function Resource Utilization Summary
Operating Mode

Operating Frequency (MHz)

Utilization Estimate
BRAM_18K DSP_48E FF LUT CLB
1 pixel 300 0 1 71 52 14
8 pixel 150 0 8 401 247 48

Performance Estimate

The following table summarizes the performance in different configurations, as generated using Vivado HLS 2017.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD (1080x1920) image.

Table 3. xFaccumulateSquare Function Performance Estimate Summary
Operating Mode Latency Estimate
Max Latency (ms)
1 pixel operation (300 MHz) 6.9
8 pixel operation (150 MHz) 1.6

Deviation from OpenCV

In OpenCV the accumulated squared image is stored in the second input image. The src2 image acts as input as well as output.



Whereas, in the xfOpenCV implementation, the accumulated squared image is stored separately.