C++ Arbitrary Precision Fixed-Point Types

C++ functions can take advantage of the arbitrary precision fixed-point types included with Vitis HLS. The following figure summarizes the basic features of these fixed-point types:

  • The word can be signed (ap_fixed) or unsigned (ap_ufixed).
  • A word with of any arbitrary size W can be defined.
  • The number of places above the decimal point I, also defines the number of decimal places in the word, W-I (represented by B in the following figure).
  • The type of rounding or quantization (Q) can be selected.
  • The overflow behavior (O and N) can be selected.
Figure 1: Arbitrary Precision Fixed-Point Types
TIP: The arbitrary precision fixed-point types can be used when header file ap_fixed.h is included in the code.

Arbitrary precision fixed-point types use more memory during C simulation. If using very large arrays of ap_[u]fixed types, refer to the discussion of C simulation in Arrays.

The advantages of using fixed-point types are:

  • They allow fractional number to be easily represented.
  • When variables have a different number of integer and decimal place bits, the alignment of the decimal point is handled.
  • There are numerous options to handle how rounding should happen: when there are too few decimal bits to represent the precision of the result.
  • There are numerous options to handle how variables should overflow: when the result is greater than the number of integer bits can represent.

These attributes are summarized by examining the code in the example below. First, the header file ap_fixed.h is included. The ap_fixed types are then defined using the typedef statement:

  • A 10-bit input: 8-bit integer value with 2 decimal places.
  • A 6-bit input: 3-bit integer value with 3 decimal places.
  • A 22-bit variable for the accumulation: 17-bit integer value with 5 decimal places.
  • A 36-bit variable for the result: 30-bit integer value with 6 decimal places.

The function contains no code to manage the alignment of the decimal point after operations are performed. The alignment is done automatically.

The following code sample shows ap_fixed type.

#include "ap_fixed.h"

typedef ap_ufixed<10,8, AP_RND, AP_SAT> din1_t;
typedef ap_fixed<6,3, AP_RND, AP_WRAP> din2_t;
typedef ap_fixed<22,17, AP_TRN, AP_SAT> dint_t;
typedef ap_fixed<36,30> dout_t;

dout_t cpp_ap_fixed(din1_t d_in1, din2_t d_in2) {

 static dint_t sum;
 sum += d_in1; 
 return sum * d_in2;
}

Using ap_(u)fixed types, the C++ simulation is bit accurate. Fast simulation can validate the algorithm and its accuracy. After synthesis, the RTL exhibits the identical bit-accurate behavior.

Arbitrary precision fixed-point types can be freely assigned literal values in the code. This is shown in the test bench (see the example below) used with the example above, in which the values of in1 and in2 are declared and assigned constant values.

When assigning literal values involving operators, the literal values must first be cast to ap_(u)fixed types. Otherwise, the C compiler and Vitis HLS interpret the literal as an integer or float/double type and may fail to find a suitable operator. As shown in the following example, in the assignment of in1 = in1 + din1_t(0.25), the literal 0.25 is cast to an ap_fixed type.

#include <cmath>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;
#include "ap_fixed.h"

typedef ap_ufixed<10,8, AP_RND, AP_SAT> din1_t;
typedef ap_fixed<6,3, AP_RND, AP_WRAP> din2_t;
typedef ap_fixed<22,17, AP_TRN, AP_SAT> dint_t;
typedef ap_fixed<36,30> dout_t;

dout_t cpp_ap_fixed(din1_t d_in1, din2_t d_in2);
int main()
 {
 ofstream result;
 din1_t in1 = 0.25;
 din2_t in2 = 2.125;
 dout_t output;
 int retval=0;


 result.open(result.dat);
 // Persistent manipulators
 result << right << fixed << setbase(10) << setprecision(15);

 for (int i = 0; i <= 250; i++)
 {
 output = cpp_ap_fixed(in1,in2);

 result << setw(10) << i;
 result << setw(20) << in1;
 result << setw(20) << in2;
 result << setw(20) << output;
 result << endl;

 in1 = in1 + din1_t(0.25);
 in2 = in2 - din2_t(0.125);
 }
 result.close();

 // Compare the results file with the golden results
 retval = system(diff --brief -w result.dat result.golden.dat);
 if (retval != 0) {
 printf(Test failed  !!!\n); 
 retval=1;
 } else {
 printf(Test passed !\n);
 }

 // Return 0 if the test passes
 return retval;
}

Fixed-Point Identifier Summary

The following table shows the quantization and overflow modes.

TIP: Quantization and overflow modes that do more than the default behavior of standard hardware arithmetic (wrap and truncate) result in operators with more associated hardware. It costs logic (LUTs) to implement the more advanced modes, such as round to minus infinity or saturate symmetrically.
Table 1. Fixed-Point Identifier Summary
Identifier Description
W Word length in bits
I The number of bits used to represent the integer value, that is, the number of integer bits to the left of the binary point. When this value is negative, it represents the number of implicit sign bits (for signed representation), or the number of implicit zero bits (for unsigned representation) to the right of the binary point. For example,
ap_fixed<2, 0> a = -0.5;    // a can be -0.5,

ap_ufixed<1, 0> x = 0.5;    // 1-bit representation. x can be 0 or 0.5
ap_ufixed<1, -1> y = 0.25;  // 1-bit representation. y can be 0 or 0.25
const ap_fixed<1, -7> z = 1.0/256;  // 1-bit representation for z = 2^-8
Q Quantization mode: This dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.
ap_fixed Types Description
AP_RND Round to plus infinity
AP_RND_ZERO Round to zero
AP_RND_MIN_INF Round to minus infinity
AP_RND_INF Round to infinity
AP_RND_CONV Convergent rounding
AP_TRN Truncation to minus infinity (default)
AP_TRN_ZERO Truncation to zero
O

Overflow mode: This dictates the behavior when the result of an operation exceeds the maximum (or minimum in the case of negative numbers) possible value that can be stored in the variable used to store the result.

ap_fixed Types Description
AP_SAT1 Saturation
AP_SAT_ZERO1 Saturation to zero
AP_SAT_SYM1 Symmetrical saturation
AP_WRAP Wrap around (default)
AP_WRAP_SM Sign magnitude wrap around
N This defines the number of saturation bits in overflow wrap modes.
  1. Using the AP_SAT* modes can result in higher resource usage as extra logic will be needed to perform saturation and this extra cost can be as high as 20% additional LUT usage.

C++ Arbitrary Precision Fixed-Point Types: Reference Information

For comprehensive information on the methods, synthesis behavior, and all aspects of using the ap_(u)fixed<N> arbitrary precision fixed-point data types, see C++ Arbitrary Precision Fixed-Point Types. This section includes:

  • Techniques for assigning constant and initialization values to arbitrary precision integers (including values greater than 1024-bit).
  • A detailed description of the overflow and saturation modes.
  • A description of Vitis HLS helper methods, such as printing, concatenating, bit-slicing and range selection functions.
  • A description of operator behavior, including a description of shift operations (a negative shift values, results in a shift in the opposite direction).
IMPORTANT: For the compiler to process, you must use the appropriate header files for the language.

C++ Arbitrary Precision Fixed-Point Types

Vitis HLS supports fixed-point types that allow fractional arithmetic to be easily handled. The advantage of fixed-point arithmetic is shown in the following example.

ap_fixed<11, 6> Var1 = 22.96875; // 11-bit signed word, 5 fractional bits
ap_ufixed<12,11> Var2 = 512.5; // 12-bit word, 1 fractional bit
ap_fixed<16,11> Res1; // 16-bit signed word, 5 fractional bits

Res1 = Var1 + Var2; // Result is 535.46875

Even though Var1 and Var2 have different precisions, the fixed-point type ensures that the decimal point is correctly aligned before the operation (an addition in this case), is performed. You are not required to perform any operations in the C code to align the decimal point.

The type used to store the result of any fixed-point arithmetic operation must be large enough (in both the integer and fractional bits) to store the full result.

If this is not the case, the ap_fixed type performs:

  • overflow handling (when the result has more MSBs than the assigned type supports)
  • quantization (or rounding, when the result has fewer LSBs than the assigned type supports)

The ap_[u]fixed type provides various options on how the overflow and quantization are performed. The options are discussed below.

ap_[u]fixed Representation

In ap[u]fixed types, a fixed-point value is represented as a sequence of bits with a specified position for the binary point.

  • Bits to the left of the binary point represent the integer part of the value.
  • Bits to the right of the binary point represent the fractional part of the value.

ap_[u]fixed type is defined as follows:

ap_[u]fixed<int W, 
 int I, 
 ap_q_mode Q, 
 ap_o_mode O,
 ap_sat_bits N>;

Quantization Modes

Rounding to plus infinity AP_RND
Rounding to zero AP_RND_ZERO
Rounding to minus infinity AP_RND_MIN_INF
Rounding to infinity AP_RND_INF
Convergent rounding AP_RND_CONV
Truncation AP_TRN
Truncation to zero AP_TRN_ZERO
AP_RND
  • Round the value to the nearest representable value for the specific ap_[u]fixed type.
    ap_fixed<3, 2, AP_RND, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.5
    ap_fixed<3, 2, AP_RND, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
AP_RND_ZERO
  • Round the value to the nearest representable value.
  • Round towards zero.
    • For positive values, delete the redundant bits.
    • For negative values, add the least significant bits to get the nearest representable value.
    ap_fixed<3, 2, AP_RND_ZERO, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_RND_ZERO, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0
AP_RND_MIN_INF
  • Round the value to the nearest representable value.
  • Round towards minus infinity.
    • For positive values, delete the redundant bits.
    • For negative values, add the least significant bits.
    ap_fixed<3, 2, AP_RND_MIN_INF, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_RND_MIN_INF, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_RND_INF
  • Round the value to the nearest representable value.
  • The rounding depends on the least significant bit.
    • For positive values, if the least significant bit is set, round towards plus infinity. Otherwise, round towards minus infinity.
    • For negative values, if the least significant bit is set, round towards minus infinity. Otherwise, round towards plus infinity.
    ap_fixed<3, 2, AP_RND_INF, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.5
    ap_fixed<3, 2, AP_RND_INF, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_RND_CONV
  • Round to the nearest representable value with "ties" rounding to even, that is, the least significant bit (after rounding) is forced to zero.
  • A "tie" is the midpoint of two representable values and occurs when the bit following the least significant bit (after rounding) is 1 and all the bits below it are zero.
    // For the following examples, bit3 of the 8-bit value becomes the
    // LSB of the final 5-bit value (after rounding).
    // Notes: 
    //   * bit7 of the 8-bit value is the MSB (sign bit)
    //   * the 3 LSBs of the 8-bit value (bit2, bit1, bit0) are treated as
    //     guard, round and sticky bits.
    //   * See http://pages.cs.wisc.edu/~david/courses/cs552/S12/handouts/guardbits.pdf
    
    ap_fixed<8,3> p1 = 1.59375; // p1 = 001.10011
    ap_fixed<5,3,AP_RND_CONV> rconv1 = p1; // rconv1 = 1.5 (001.10)
    
    ap_fixed<8,3> p2 = 1.625; // p2 = 001.10100 => tie with bit3 (LSB-to-be) = 0
    ap_fixed<5,3,AP_RND_CONV> rconv2 = p2; // rconv2 = 1.5 (001.10) => lsb is already zero, just truncate
    
    ap_fixed<8,3> p3 = 1.375; // p3 = 001.01100 => tie with bit3 (LSB-to-be) = 1
    ap_fixed<5,3,AP_RND_CONV> rconv3 = p3; // rconv3 = 1.5 (001.10) => lsb is made zero by rounding up
    
    ap_fixed<8,3> p3 = 1.65625; // p3 = 001.10101
    ap_fixed<5,3,AP_RND_CONV> rconv3 = p3; // rconv3 = 1.75 (001.11) => round up
    
AP_TRN
  • Always round the value towards minus infinity.
    ap_fixed<3, 2, AP_TRN, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_TRN, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.5
AP_TRN_ZERO

Round the value to:

  • For positive values, the rounding is the same as mode AP_TRN.
  • For negative values, round towards zero.
    ap_fixed<3, 2, AP_TRN_ZERO, AP_SAT> UAPFixed4 = 1.25; // Yields: 1.0
    ap_fixed<3, 2, AP_TRN_ZERO, AP_SAT> UAPFixed4 = -1.25; // Yields: -1.0

Overflow Modes

Saturation AP_SAT
Saturation to zero AP_SAT_ZERO
Symmetrical saturation AP_SAT_SYM
Wrap-around AP_WRAP
Sign magnitude wrap-around AP_WRAP_SM
AP_SAT

Saturate the value.

  • To the maximum value in case of overflow.
  • To the negative maximum value in case of negative overflow.
    ap_fixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = 19.0; // Yields: 7.0
    ap_fixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = -19.0; // Yields: -8.0
    ap_ufixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = 19.0; // Yields: 15.0
    ap_ufixed<4, 4, AP_RND, AP_SAT> UAPFixed4 = -19.0; // Yields: 0.0
AP_SAT_ZERO

Force the value to zero in case of overflow, or negative overflow.

ap_fixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = 19.0; // Yields: 0.0
ap_fixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = -19.0; // Yields: 0.0
ap_ufixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = 19.0; // Yields: 0.0
ap_ufixed<4, 4, AP_RND, AP_SAT_ZERO> UAPFixed4 = -19.0; // Yields: 0.0
AP_SAT_SYM

Saturate the value:

  • To the maximum value in case of overflow.
  • To the minimum value in case of negative overflow.
    • Negative maximum for signed ap_fixed types
    • Zero for unsigned ap_ufixed types
    ap_fixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = 19.0; // Yields: 7.0
    ap_fixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = -19.0; // Yields: -7.0
    ap_ufixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = 19.0; // Yields: 15.0
    ap_ufixed<4, 4, AP_RND, AP_SAT_SYM> UAPFixed4 = -19.0; // Yields: 0.0
AP_WRAP

Wrap the value around in case of overflow.

ap_fixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = 31.0; // Yields: -1.0
ap_fixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = -19.0; // Yields: -3.0
ap_ufixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = 19.0; // Yields: 3.0
ap_ufixed<4, 4, AP_RND, AP_WRAP> UAPFixed4 = -19.0; // Yields: 13.0

If the value of N is set to zero (the default overflow mode):

  • All MSB bits outside the range are deleted.
  • For unsigned numbers. After the maximum it wraps around to zero.
  • For signed numbers. After the maximum, it wraps to the minimum values.

If N>0:

  • When N > 0, N MSB bits are saturated or set to 1.
  • The sign bit is retained, so positive numbers remain positive and negative numbers remain negative.
  • The bits that are not saturated are copied starting from the LSB side.
AP_WRAP_SM

The value should be sign-magnitude wrapped around.

ap_fixed<4, 4, AP_RND, AP_WRAP_SM> UAPFixed4 = 19.0; // Yields: -4.0
ap_fixed<4, 4, AP_RND, AP_WRAP_SM> UAPFixed4 = -19.0; // Yields: 2.0

If the value of N is set to zero (the default overflow mode):

  • This mode uses sign magnitude wrapping.
  • Sign bit set to the value of the least significant deleted bit.
  • If the most significant remaining bit is different from the original MSB, all the remaining bits are inverted.
  • If MSBs are same, the other bits are copied over.
    1. Delete redundant MSBs.
    2. The new sign bit is the least significant bit of the deleted bits. 0 in this case.
    3. Compare the new sign bit with the sign of the new value.
  • If different, invert all the numbers. They are different in this case.

If N>0:

  • Uses sign magnitude saturation
  • N MSBs are saturated to 1.
  • Behaves similar to a case in which N = 0, except that positive numbers stay positive and negative numbers stay negative.

Compiling ap_[u]fixed<> Types

To use the ap_[u]fixed<> classes, you must include the ap_fixed.h header file in all source files that reference ap_[u]fixed<> variables.

When compiling software models that use these classes, it may be necessary to specify the location of the Vitis HLS header files, for example by adding the “-I/<HLS_HOME>/include” option for g++ compilation.

Declaring and Defining ap_[u]fixed<> Variables

There are separate signed and unsigned classes:

  • ap_fixed<W,I> (signed)
  • ap_ufixed<W,I> (unsigned)

You can create user-defined types with the C/C++ typedef statement:


#include "ap_fixed.h" // use ap_[u]fixed<> types

typedef ap_ufixed<128,32> uint128_t; // 128-bit user defined type, 
 //  32 integer bits

User-Defined Types Examples

Initialization and Assignment from Constants (Literals)

You can initialize ap_[u]fixed variable with normal floating point constants of the usual C/C++ width:

  • 32 bits for type float
  • 64 bits for type double

That is, typically, a floating point value that is single precision type or in the form of double precision.

Note that the value assigned to the fixed-point variable will be limited by the precision of the constant. Use string initialization as described in Initialization and Assignment from Constants (Literals) to ensure that all bits of the fixed-point variable are populated according to the precision described by the string.


#include <ap_fixed.h>

ap_ufixed<30, 15> my15BitInt = 3.1415;
ap_fixed<42, 23> my42BitInt = -1158.987;
ap_ufixed<99, 40> = 287432.0382911;
ap_fixed<36,30> = -0x123.456p-1;

The ap_[u]fixed types do not support initialization if they are used in an array of std::complex types.


typedef ap_fixed<DIN_W, 1, AP_TRN, AP_SAT> coeff_t; // MUST have IW >= 1
std::complex<coeff_t> twid_rom[REAL_SZ/2] = {{ 1, -0 },{ 0.9,-0.006 }, etc.}

The initialization values must first be cast to std::complex:


typedef ap_fixed<DIN_W, 1, AP_TRN, AP_SAT> coeff_t; // MUST have IW >= 1
std::complex<coeff_t> twid_rom[REAL_SZ/2] = {std::complex<coeff_t>( 1, -0 ), 
std::complex<coeff_t>(0.9,-0.006 ),etc.}

Support for Console I/O (Printing)

As with initialization and assignment to ap_[u]fixed<> variables, Vitis HLS supports printing values that require more than 64 bits to represent.

The easiest way to output any value stored in an ap_[u]fixed variable is to use the C++ standard output stream, std::cout (#include <iostream> or <iostream.h>). The stream insertion operator, “<<“, is overloaded to correctly output the full range of values possible for any given ap_[u]fixed variable. The following stream manipulators are also supported, allowing formatting of the value as shown.

  • dec (decimal)
  • hex (hexadecimal)
  • oct (octal)
    #include <iostream.h>
    // Alternative: #include <iostream>
    
    ap_fixed<6,3, AP_RND, AP_WRAP> Val = 3.25;
    
    cout << Val << endl;     // Yields: 3.25
Using the Standard C Library

You can also use the standard C library (#include <stdio.h>) to print out values larger than 64-bits:

  1. Convert the value to a C++ std::string using the ap_[u]fixed classes method to_string().
  2. Convert the result to a null-terminated C character string using the std::string class method c_str().
Optional Argument One (Specifying the Radix)

You can pass the ap[u]int::to_string() method an optional argument specifying the radix of the numerical format desired. The valid radix argument values are:

  • 2 (binary)
  • 8 (octal
  • 10 (decimal)
  • 16 (hexadecimal) (default)
Optional Argument Two (Printing as Signed Values)

A second optional argument to ap_[u]int::to_string() specifies whether to print the non-decimal formats as signed values. This argument is boolean. The default value is false, causing the non-decimal formats to be printed as unsigned values.

ap_fixed<6,3, AP_RND, AP_WRAP> Val = 3.25;

printf("%s \n", in2.to_string().c_str()); // Yields: 0b011.010
printf("%s \n", in2.to_string(10).c_str()); //Yields: 3.25

The ap_[u]fixed types are supported by the following C++ manipulator functions:

  • setprecision
  • setw
  • setfill

The setprecision manipulator sets the decimal precision to be used. It takes one parameter f as the value of decimal precision, where n specifies the maximum number of meaningful digits to display in total (counting both those before and those after the decimal point).

The default value of f is 6, which is consistent with native C float type.

ap_fixed<64, 32> f =3.14159;
cout << setprecision (5) << f << endl;
cout << setprecision (9) << f << endl;
f = 123456;
cout << setprecision (5) << f << endl;

The example above displays the following results where the printed results are rounded when the actual precision exceeds the specified precision:

   3.1416
   3.14159
   1.2346e+05

The setw manipulator:

  • Sets the number of characters to be used for the field width.
  • Takes one parameter w as the value of the width

    where

    • w determines the minimum number of characters to be written in some output representation.

If the standard width of the representation is shorter than the field width, the representation is padded with fill characters. Fill characters are controlled by the setfill manipulator which takes one parameter f as the padding character.

For example, given:

    ap_fixed<65,32> aa = 123456;
    int precision = 5;
    cout<<setprecision(precision)<<setw(13)<<setfill('T')<<a<<endl;

The output is:

     TTT1.2346e+05

Expressions Involving ap_[u]fixed<> types

Arbitrary precision fixed-point values can participate in expressions that use any operators supported by C/C++. After an arbitrary precision fixed-point type or variable is defined, their usage is the same as for any floating point type or variable in the C/C++ languages.

Observe the following caveats:

  • Zero and Sign Extensions

    All values of smaller bit-width are zero or sign-extended depending on the sign of the source value. You may need to insert casts to obtain alternative signs when assigning smaller bit-widths to larger.

  • Truncations

    Truncation occurs when you assign an arbitrary precision fixed-point of larger bit-width than the destination variable.

Class Methods, Operators, and Data Members

In general, any valid operation that can be done on a native C/C++ integer data type is supported (using operator overloading) for ap_[u]fixed types. In addition to these overloaded operators, some class specific operators and methods are included to ease bit-level operations.

Binary Arithmetic Operators
Addition
ap_[u]fixed::RType ap_[u]fixed::operator + (ap_[u]fixed op)

Adds an arbitrary precision fixed-point with a given operand op.

The operands can be any of the following integer types:

  • ap_[u]fixed
  • ap_[u]int
  • C/C++

The result type ap_[u]fixed::RType depends on the type information of the two operands.

ap_fixed<76, 63> Result;

ap_fixed<5, 2> Val1 = 1.125;
ap_fixed<75, 62> Val2 = 6721.35595703125;

Result = Val1 + Val2; //Yields 6722.480957

Because Val2 has the larger bit-width on both integer part and fraction part, the result type has the same bit-width and plus one to be able to store all possible result values.

Specifying the data's width controls resources by using the power functions, as shown below. In similar cases, Xilinx recommends specifying the width of the stored result instead of specifying the width of fixed point operations.

ap_ufixed<16,6> x=5; 
ap_ufixed<16,7>y=hl::rsqrt<16,6>(x+x); 
Subtraction
ap_[u]fixed::RType ap_[u]fixed::operator - (ap_[u]fixed op)

Subtracts an arbitrary precision fixed-point with a given operand op.

The result type ap_[u]fixed::RType depends on the type information of the two operands.

ap_fixed<76, 63> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val2 - Val1; // Yields 6720.23057

Because Val2 has the larger bit-width on both integer part and fraction part, the result type has the same bit-width and plus one to be able to store all possible result values.

Multiplication
ap_[u]fixed::RType ap_[u]fixed::operator * (ap_[u]fixed op)

Multiplies an arbitrary precision fixed-point with a given operand op.

ap_fixed<80, 64> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 * Val2; // Yields 7561.525452

This shows the multiplication of Val1 and Val2. The result type is the sum of their integer part bit-width and their fraction part bit width.

Division
ap_[u]fixed::RType ap_[u]fixed::operator / (ap_[u]fixed op)

Divides an arbitrary precision fixed-point by a given operand op.

ap_fixed<84, 66> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Val2 / Val1; // Yields 5974.538628

This shows the division of Val2 and Val1. To preserve enough precision:

  • The integer bit-width of the result type is sum of the integer bit-width of Val2 and the fraction bit-width of Val1.
  • The fraction bit-width of the result type is equal to the fraction bit-width of Val2.
Bitwise Logical Operators
Bitwise OR
ap_[u]fixed::RType ap_[u]fixed::operator | (ap_[u]fixed op)

Applies a bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 | Val2; // Yields 6271.480957
Bitwise AND
ap_[u]fixed::RType ap_[u]fixed::operator & (ap_[u]fixed op)

Applies a bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 & Val2;  // Yields 1.00000
Bitwise XOR
ap_[u]fixed::RType ap_[u]fixed::operator ^ (ap_[u]fixed op)

Applies an xor bitwise operation on an arbitrary precision fixed-point and a given operand op.

ap_fixed<75, 62> Result;

ap_fixed<5, 2> Val1 = 1625.153;
ap_fixed<75, 62> Val2 = 6721.355992351;

Result = Val1 ^ Val2; // Yields 6720.480957
Increment and Decrement Operators
Pre-Increment
ap_[u]fixed ap_[u]fixed::operator ++ ()

This operator function prefix increases an arbitrary precision fixed-point variable by 1.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = ++Val1; // Yields 6.125000
Post-Increment
ap_[u]fixed ap_[u]fixed::operator ++ (int)

This operator function postfix:

  • Increases an arbitrary precision fixed-point variable by 1.
  • Returns the original val of this arbitrary precision fixed-point.
    ap_fixed<25, 8> Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = Val1++; // Yields 5.125000
Pre-Decrement
ap_[u]fixed ap_[u]fixed::operator -- ()

This operator function prefix decreases this arbitrary precision fixed-point variable by 1.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = --Val1; // Yields 4.125000
Post-Decrement
ap_[u]fixed ap_[u]fixed::operator -- (int)

This operator function postfix:

  • Decreases this arbitrary precision fixed-point variable by 1.
  • Returns the original val of this arbitrary precision fixed-point.
    ap_fixed<25, 8> Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = Val1--; // Yields 5.125000
Unary Operators
Addition
ap_[u]fixed ap_[u]fixed::operator + ()

Returns a self copy of an arbitrary precision fixed-point variable.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = +Val1;  // Yields 5.125000
Subtraction
ap_[u]fixed::RType ap_[u]fixed::operator - ()

Returns a negative value of an arbitrary precision fixed-point variable.

ap_fixed<25, 8> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = -Val1; // Yields -5.125000
Equality Zero
bool ap_[u]fixed::operator ! ()

This operator function:

  • Compares an arbitrary precision fixed-point variable with 0,
  • Returns the result.
    bool  Result;
    ap_fixed<8, 5> Val1 = 5.125;
    
    Result = !Val1; // Yields false
Bitwise Inverse
ap_[u]fixed::RType ap_[u]fixed::operator ~ ()

Returns a bitwise complement of an arbitrary precision fixed-point variable.

ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val1 = 5.125;

Result = ~Val1; // Yields -5.25
Shift Operators
Unsigned Shift Left
ap_[u]fixed ap_[u]fixed::operator << (ap_uint<_W2> op) 

This operator function:

  • Shifts left by a given integer operand.
  • Returns the result.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift left operation is the same width as the type being shifted.

Note: Shift does not support overflow or quantization modes.
ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val = 5.375;

ap_uint<4> sh = 2;

Result = Val << sh; // Yields -10.5

The bit-width of the result is (W = 25, I = 15). Because the shift left operation result type is same as the type of Val:

  • The high order two bits of Val are shifted out.
  • The result is -10.5.

If a result of 21.5 is required, Val must be cast to ap_fixed<10, 7> first -- for example, ap_ufixed<10, 7>(Val).

Signed Shift Left
ap_[u]fixed ap_[u]fixed::operator << (ap_int<_W2> op)

This operator:

  • Shifts left by a given integer operand.
  • Returns the result.

The shift direction depends on whether the operand is positive or negative.

  • If the operand is positive, a shift right is performed.
  • If the operand is negative, a shift left (opposite direction) is performed.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift right operation is the same width as the type being shifted.

ap_fixed<25, 15,  false> Result;
ap_uint<8, 5> Val = 5.375;

ap_int<4> Sh = 2;
Result = Val << sh; // Shift left, yields -10.25

Sh = -2;
Result = Val << sh; // Shift right, yields 1.25
Unsigned Shift Right
ap_[u]fixed ap_[u]fixed::operator >> (ap_uint<_W2> op) 

This operator function:

  • Shifts right by a given integer operand.
  • Returns the result.

The operand can be a C/C++ integer type:

  • char
  • short
  • int
  • long

The return type of the shift right operation is the same width as the type being shifted.

ap_fixed<25, 15> Result;
ap_fixed<8, 5> Val = 5.375;

ap_uint<4> sh = 2;

Result = Val >> sh; // Yields 1.25

If it is necessary to preserve all significant bits, extend fraction part bit-width of the Val first, for example ap_fixed<10, 5>(Val).

Signed Shift Right
ap_[u]fixed ap_[u]fixed::operator >> (ap_int<_W2> op) 

This operator:

  • Shifts right by a given integer operand.
  • Returns the result.

The shift direction depends on whether operand is positive or negative.

  • If the operand is positive, a shift right performed.
  • If operand is negative, a shift left (opposite direction) is performed.

The operand can be a C/C++ integer type (char, short, int, or long).

The return type of the shift right operation is the same width as type being shifted. For example:

ap_fixed<25, 15,  false> Result;
ap_uint<8, 5> Val = 5.375;

ap_int<4> Sh = 2;
Result = Val >> sh; // Shift right, yields 1.25

Sh = -2;
Result = Val >> sh; // Shift left,  yields -10.5

1.25
Relational Operators
Equality
bool ap_[u]fixed::operator == (ap_[u]fixed op)

This operator compares the arbitrary precision fixed-point variable with a given operand.

Returns true if they are equal and false if they are not equal.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types. For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 == Val2; // Yields  true
Result = Val1 == Val3; // Yields  false
Inequality
bool ap_[u]fixed::operator != (ap_[u]fixed op)

This operator compares this arbitrary precision fixed-point variable with a given operand.

Returns true if they are not equal and false if they are equal.

The type of operand op can be:

  • ap_[u]fixed
  • ap_int
  • C or C++ integer types

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 != Val2; // Yields false
Result = Val1 != Val3; // Yields true
Greater than or equal to
bool ap_[u]fixed::operator >= (ap_[u]fixed op)

This operator compares a variable with a given operand.

Returns true if they are equal or if the variable is greater than the operator and false otherwise.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 >= Val2; // Yields true
Result = Val1 >= Val3; // Yields false
Less than or equal to
bool ap_[u]fixed::operator <= (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is equal to or less than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 <= Val2; // Yields true
Result = Val1 <= Val3; // Yields true
Greater than
bool ap_[u]fixed::operator > (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is greater than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int, or C/C++ integer types.

For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 > Val2; // Yields false
Result = Val1 > Val3; // Yields false
Less than
bool ap_[u]fixed::operator < (ap_[u]fixed op)

This operator compares a variable with a given operand, and return true if it is less than the operand and false if not.

The type of operand op can be ap_[u]fixed, ap_int, or C/C++ integer types. For example:

bool Result;

ap_ufixed<8, 5> Val1 = 1.25;
ap_fixed<9, 4> Val2 = 17.25;
ap_fixed<10, 5> Val3 = 3.25;

Result = Val1 < Val2; // Yields false
Result = Val1 < Val3; // Yields true
Bit Operator
Bit-Select and Set
af_bit_ref ap_[u]fixed::operator [] (int bit) 

This operator selects one bit from an arbitrary precision fixed-point value and returns it.

The returned value is a reference value that can set or clear the corresponding bit in the ap_[u]fixed variable. The bit argument must be an integer value and it specifies the index of the bit to select. The least significant bit has index 0. The highest permissible index is one less than the bit-width of this ap_[u]fixed variable.

The result type is af_bit_ref with a value of either 0 or 1. For example:

ap_int<8, 5> Value = 1.375;

Value[3]; // Yields  1
Value[4]; // Yields  0

Value[2] = 1; // Yields 1.875
Value[3] = 0; // Yields 0.875
Bit Range
af_range_ref af_(u)fixed::range (unsigned Hi, unsigned Lo)
af_range_ref af_(u)fixed::operator [] (unsigned Hi, unsigned Lo) 

This operation is similar to bit-select operator [] except that it operates on a range of bits instead of a single bit.

It selects a group of bits from the arbitrary precision fixed-point variable. The Hi argument provides the upper range of bits to be selected. The Lo argument provides the lowest bit to be selected. If Lo is larger than Hi the bits selected are returned in the reverse order.

The return type af_range_ref represents a reference in the range of the ap_[u]fixed variable specified by Hi and Lo. For example:

ap_uint<4> Result = 0;
ap_ufixed<4, 2> Value = 1.25;
ap_uint<8> Repl = 0xAA;

Result = Value.range(3, 0); // Yields: 0x5
Value(3, 0) = Repl(3, 0); // Yields: -1.5

// when Lo > Hi, return the reverse bits string
Result = Value.range(0, 3); // Yields: 0xA
Range Select
af_range_ref af_(u)fixed::range ()
af_range_ref af_(u)fixed::operator []

This operation is the special case of the range select operator []. It selects all bits from this arbitrary precision fixed-point value in the normal order.

The return type af_range_ref represents a reference to the range specified by Hi = W - 1 and Lo = 0. For example:

ap_uint<4> Result = 0;

ap_ufixed<4, 2> Value = 1.25;
ap_uint<8> Repl = 0xAA;

Result = Value.range(); // Yields: 0x5
Value() = Repl(3, 0); // Yields: -1.5
Length
int ap_[u]fixed::length ()

This function returns an integer value that provides the number of bits in an arbitrary precision fixed-point value. It can be used with a type or a value. For example:

ap_ufixed<128, 64> My128APFixed;

int bitwidth = My128APFixed.length(); // Yields 128
Explicit Conversion Methods
Fixed to Double
double ap_[u]fixed::to_double ()

This member function returns this fixed-point value in form of IEEE double precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
double Result;

Result = MyAPFixed.to_double(); // Yields 333.789
Fixed to Float
float ap_[u]fixed::to_float()

This member function returns this fixed-point value in form of IEEE float precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
float Result;

Result = MyAPFixed.to_float();  // Yields 333.789
Fixed to Half-Precision Floating Point
half ap_[u]fixed::to_half()

This member function return this fixed-point value in form of HLS half-precision (16-bit) float precision format. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
half Result;

Result = MyAPFixed.to_half();  // Yields 333.789
Fixed to ap_int
ap_int ap_[u]fixed::to_ap_int ()

This member function explicitly converts this fixed-point value to ap_int that captures all integer bits (fraction bits are truncated). For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
ap_uint<77> Result;

Result = MyAPFixed.to_ap_int(); //Yields 333
Fixed to Integer
int ap_[u]fixed::to_int ()
unsigned ap_[u]fixed::to_uint ()
ap_slong ap_[u]fixed::to_int64 ()
ap_ulong ap_[u]fixed::to_uint64 ()

This member function explicitly converts this fixed-point value to C built-in integer types. For example:

ap_ufixed<256, 77> MyAPFixed = 333.789;
unsigned int  Result;

Result = MyAPFixed.to_uint(); //Yields 333

unsigned long long Result;
Result = MyAPFixed.to_uint64(); //Yields 333
Note: Xilinx recommends that you explicitly call member functions instead of using C-style cast to convert ap_[u]fixed to other data types.
Compile Time Access to Data Type Attributes

The ap_[u]fixed<> types are provided with several static members that allow the size and configuration of data types to be determined at compile time. The data type is provided with the static const members: width, iwidth, qmode and omode:

static const int width = _AP_W;
static const int iwidth = _AP_I;
static const ap_q_mode qmode = _AP_Q;
static const ap_o_mode omode = _AP_O;

You can use these data members to extract the following information from any existing ap_[u]fixed<> data type:

width
The width of the data type.
iwidth
The width of the integer part of the data type.
qmode
The quantization mode of the data type.
omode
The overflow mode of the data type.

For example, you can use these data members to extract the data width of an existing ap_[u]fixed<> data type to create another ap_[u]fixed<> data type at compile time.

The following example shows how the size of variable Res is automatically defined as 1-bit greater than variables Val1 and Val2 with the same quantization modes:

// Definition of basic data type
#define INPUT_DATA_WIDTH 12
#define IN_INTG_WIDTH 6
#define IN_QMODE AP_RND_ZERO
#define IN_OMODE AP_WRAP
typedef ap_fixed<INPUT_DATA_WIDTH, IN_INTG_WIDTH, IN_QMODE, IN_OMODE> data_t;
// Definition of variables 
data_t Val1, Val2;
// Res is automatically sized at run-time to be 1-bit greater than INPUT_DATA_WIDTH 
// The bit growth in Res will be in the integer bits
ap_int<data_t::width+1, data_t::iwidth+1, data_t::qmode, data_t::omode> Res = Val1 + 
Val2;

This ensures that Vitis HLS correctly models the bit-growth caused by the addition even if you update the value of INPUT_DATA_WIDTH, IN_INTG_WIDTH, or the quantization modes for data_t.