AI Engine Intrinsics  (AIE) r2p21
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
16-bit Real x 16-bit Real

Overview

16-bit Real self multiplication intrinsics.

Functions

v16acc48 mac16 (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function .
 
v16acc48 mac16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v16acc48 mac16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v8acc48 mac8 (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function .
 
v8acc48 mac8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v8acc48 mac8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v16acc48 msc16 (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function .
 
v16acc48 msc16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v16acc48 msc16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v8acc48 msc8 (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function .
 
v8acc48 msc8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v8acc48 msc8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v16acc48 mul16 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function .
 
v16acc48 mul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v16acc48 mul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v8acc48 mul8 (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function .
 
v8acc48 mul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v8acc48 mul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v16acc48 negmul16 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function .
 
v16acc48 negmul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v16acc48 negmul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v8acc48 negmul8 (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function .
 
v8acc48 negmul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v8acc48 negmul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 

Function Documentation

v16acc48 mac16 ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function .

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mac16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mac16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function .

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the ybuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function .

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
acc v16acc48 Incoming accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function .

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
acc v8acc48 Incoming accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the ybuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function .

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function .

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the ybuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function .

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )

Parameters

Input/OutputType Comments
return v16acc48 Returned accumulation vector (16 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi unsigned int 4b offset for each lane. LSB apply to 8th lane
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function .

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v64int16Input buffer of 64 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ystart int Starting position offset applied to all lanes of input from xbuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the xbuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )

Parameters

Input/OutputType Comments
return v8acc48 Returned accumulation vector (8 x int48 lanes)
xbuff v32int16Input buffer of 32 elements of type int16
xstart int Starting position offset applied to all lanes of input from X buffer
xoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstep unsigned int Step between each column for selection in the xbuffer
xsquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
ybuff v32int16Right input buffer of 32 elements of type int16
ystart int Starting position offset applied to all lanes of input from ybuffer for the second input
yoffsets unsigned int 4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystep unsigned int Step between each column for selection in the ybuffer
ysquare unsigned int Select order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.