Intrinsics for moving values from accumulator data-types to vector data-types. More...

Overview

Intrinsics for moving values from accumulator data-types to vector data-types.

Moving data from accumulator data-types back to standard vector data-types requires a reduction in precision. For fixed-point arithmetic, an appropriate transformation involving shifting out lower order bits, rounding and/or saturation can be applied using the SRS family of intrinsics. The shift amount is specified as a parameter (in the range -1 to 62 which will be encoded as 0..63 in the instruction), while the rounding and saturation is applied based on global mode registers of the processor (see Mode Settings).

There are four main variants of the SRS intrinsics based on width of input and output data-types:

bsrs is used to convert integer
- 48-bit accumulator data into a corresponding signed 8-bit vector
ubsrs is used to convert integer
- 48-bit accumulator data into a corresponding unsigned 8-bit vector
srs is used to convert integer
- 48-bit accumulator data into a corresponding 16-bit vector, or
- 80-bit accumulator data into a corresponding 32-bit vector
lsrs is used to convert integer
- 48-bit accumulator data into a corresponding 32-bit vector, or
- 80-bit accumulator data into a corresponding 64-bit vector

There is also a variant of the SRS intrinsic called srs_ilv that allows interleaving of the values from the bottom and top halves of the accumulator. The bottom half corresponds to the LSB, and the first output value selected is the first value from this half.

Example

Using the standard srs intrinsic the 8 accumulator lanes of a v8cacc48 are shifted directly to the 8 output lanes of a v8cint16. Each lane does a separate shifting, rounding and saturation (depending on the parameters):

o0 = srs(acc0)
o1 = srs(acc1)
o2 = srs(acc2)
o3 = srs(acc3)
o4 = srs(acc4)
o5 = srs(acc5)
o6 = srs(acc6)
o7 = srs(acc7)

Using the interleaved srs_ilv intrinsic, the lanes from the accumulator are interleaved to generate the output lanes:

o0 = srs(acc0)
o1 = srs(acc4)
o2 = srs(acc1)
o3 = srs(acc5)
o4 = srs(acc2)
o5 = srs(acc6)
o6 = srs(acc3)
o7 = srs(acc7)

See Also: 'ups' Intrinsics (Upshift)

48-bit Integer Accumulator Conversion into 8-bit/16-bit/32-bit Integer Vector
v16int8	bsrs (v16acc48 a, int shft)

v16uint8	ubsrs (v16acc48 a, int shft)

v8int16	srs (v8acc48 a, int shft)

v4cint16	srs (v4cacc48 a, int shft)

v8int32	lsrs (v8acc48 a, int shft)

v4cint32	lsrs (v4cacc48 a, int shft)

v16int16	srs (v16acc48 a, int shft)

v8cint16	srs (v8cacc48 a, int shft)

v16int32	lsrs (v16acc48 a, int shft)

v8cint32	lsrs (v8cacc48 a, int shft)

80-bit Integer Accumulator Conversion into 32-bit/64-bit Integer Vector
v4int32	srs (v4acc80 a, int shft)

v2cint32	srs (v2cacc80 a, int shft)

v4int64	lsrs (v4acc80 a, int shft)

v2cint64	lsrs (v2cacc80 a, int shft)

v8int32	srs (v8acc80 a, int shft)

v4cint32	srs (v4cacc80 a, int shft)

v8int64	lsrs (v8acc80 a, int shft)

v4cint64	lsrs (v4cacc80 a, int shft)

Interleaved Integer Accumulator Conversion
v16int16	srs_ilv (v16acc48 a, int shft)

v8cint16	srs_ilv (v8cacc48 a, int shft)

v8int32	srs_ilv (v8acc80 a, int shft)

v4cint32	srs_ilv (v4cacc80 a, int shft)