Overview

Advanced Float Vector Operations. This page contains the fully configurable fpmac_conf and some convenient wrappers to it. The lane selection scheme is explained after each intrinsic definition.

Some of this floating point operations can generate exceptions, for more information you can go here.

Functions
v8float	fpabs_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v8float xbuf, v8float zbuf)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpabs_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and take absolute value for single precision real times real floating point vectors.

v8float	fpmac (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v8float	fpmac (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex floating point vectors.

v8float	fpmac (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v8float	fpmac (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v8float	fpmac (v8float acc, v8float xbuf, v8float zbuf)
	Multiply and accumulate for single precision real times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision real times complex floating point vectors.

v8float	fpmac (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v8float	fpmac (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
	Multiply and accumulate for single precision complex times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times real floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v4cfloat	fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex floating point vectors.

v8float	fpmac_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v8float xbuf, v8float zbuf)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v8float	fpmac_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
	Multiply and accumulate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmac_c (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmac_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

v8float	fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v8float	fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply and accumulate for single precision floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmac_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

v8float	fpmsc (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v8float	fpmsc (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex floating point vectors.

v8float	fpmsc (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v8float	fpmsc (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v8float	fpmsc (v8float acc, v8float xbuf, v8float zbuf)
	Multiply and subtract for single precision real times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision real times complex floating point vectors.

v8float	fpmsc (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v8float	fpmsc (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
	Multiply and subtract for single precision complex times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times real floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v4cfloat	fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex floating point vectors.

v8float	fpmsc_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v8float xbuf, v8float zbuf)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v8float	fpmsc_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and subtract for single precision real times real floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
	Multiply and subtract for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmsc_c (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmsc_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and subtract for single precision complex times complex conjugate floating point vectors.

v8float	fpmul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v8float	fpmul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v4cfloat	fpmul (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex floating point vectors.

v8float	fpmul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v8float	fpmul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v8float	fpmul (v8float xbuf, v8float zbuf)
	Multiply for single precision real times real floating point vectors.

v4cfloat	fpmul (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex floating point vectors.

v4cfloat	fpmul (v4float xbuf, v4cfloat zbuf)
	Multiply for single precision real times complex floating point vectors.

v8float	fpmul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v8float	fpmul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision real times real floating point vectors.

v4cfloat	fpmul (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex floating point vectors.

v4cfloat	fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times real floating point vectors.

v4cfloat	fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times real floating point vectors.

v4cfloat	fpmul (v4cfloat xbuf, v8float zbuf)
	Multiply for single precision complex times real floating point vectors.

v4cfloat	fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v4cfloat xbuf, v4cfloat zbuf)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times real floating point vectors.

v4cfloat	fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex floating point vectors.

v4cfloat	fpmul_c (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmul_c (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmul_c (v4float xbuf, v4cfloat zbuf)
	Multiply for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmul_c (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision real times complex conjugate floating point vectors.

v4cfloat	fpmul_c (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmul_c (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmul_c (v4cfloat xbuf, v8float zbuf)
	Multiply for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmul_c (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times real floating point vectors.

v4cfloat	fpmul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v4cfloat xbuf, v4cfloat zbuf)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpmul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v4cfloat xbuf, v4cfloat zbuf)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpmul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex conjugate times complex floating point vectors.

v8float	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v8float	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
	Fully configurable multiply for single precision floating point vectors.

v4cfloat	fpmul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v4cfloat xbuf, v4cfloat zbuf)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpmul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply for single precision complex times complex conjugate floating point vectors.

v8float	fpneg_abs_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v8float xbuf, v8float zbuf)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_abs_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply, take absolute value and negate for single precision real times real floating point vectors.

v8float	fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v8float	fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v4cfloat	fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex floating point vectors.

v8float	fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v8float	fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v8float	fpneg_mul (v8float xbuf, v8float zbuf)
	Multiply and negate for single precision real times real floating point vectors.

v4cfloat	fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex floating point vectors.

v4cfloat	fpneg_mul (v4float xbuf, v4cfloat zbuf)
	Multiply and negate for single precision real times complex floating point vectors.

v8float	fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v8float	fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times real floating point vectors.

v4cfloat	fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex floating point vectors.

v4cfloat	fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times real floating point vectors.

v4cfloat	fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times real floating point vectors.

v4cfloat	fpneg_mul (v4cfloat xbuf, v8float zbuf)
	Multiply and negate for single precision complex times real floating point vectors.

v4cfloat	fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v4cfloat xbuf, v4cfloat zbuf)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times real floating point vectors.

v4cfloat	fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex floating point vectors.

v4cfloat	fpneg_mul_c (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_c (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_c (v4float xbuf, v4cfloat zbuf)
	Multiply and negate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_c (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision real times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_c (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpneg_mul_c (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpneg_mul_c (v4cfloat xbuf, v8float zbuf)
	Multiply and negate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpneg_mul_c (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times real floating point vectors.

v4cfloat	fpneg_mul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v4cfloat xbuf, v4cfloat zbuf)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v4cfloat xbuf, v4cfloat zbuf)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex conjugate times complex floating point vectors.

v4cfloat	fpneg_mul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v4cfloat xbuf, v4cfloat zbuf)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

v4cfloat	fpneg_mul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
	Multiply and negate for single precision complex times complex conjugate floating point vectors.

Function Documentation

v8float fpabs_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpabs_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpabs_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpabs_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpabs_mul	(	v8float	xbuf,
		v8float	zbuf
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpabs_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpabs_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmac	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmac	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmac	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmac	(	v8float	acc,
		v8float	xbuf,
		v8float	zbuf
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpmac	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmac	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmac_abs	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac_abs	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmac_abs	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac_abs	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmac_abs	(	v8float	acc,
		v8float	xbuf,
		v8float	zbuf
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpmac_abs	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmac_abs	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac_c	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_cn	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmac_conf	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmac_conf	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmac_conf	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmac_conf	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmac_conf	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmac_conf	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmac_conf	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmac_conf	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = acc[i] + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmac_nc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmsc	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmsc	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmsc	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmsc	(	v8float	acc,
		v8float	xbuf,
		v8float	zbuf
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpmsc	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmsc	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmsc_abs	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc_abs	(	v8float	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmsc_abs	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc_abs	(	v8float	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmsc_abs	(	v8float	acc,
		v8float	xbuf,
		v8float	zbuf
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpmsc_abs	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmsc_abs	(	v8float	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc_c	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_cn	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmsc_nc	(	v4cfloat	acc,
		v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

acc	Incoming accumulation vector.
xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpmul	(	v8float	xbuf,
		v8float	zbuf
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpmul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpmul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpmul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul	(	v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_c	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_c	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_c	(	v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul_c	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_c	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul_c	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul_c	(	v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul_c	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpmul_cc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cc	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul_cc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul_cn	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_cn	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v8float fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v8float fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode,
		unsigned int &	cmp
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    cmp[i] = signbit(o[i])
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    cmp[i] = signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    cmp[i] = ~signbit(o[i])
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp	8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).

v4cfloat fpmul_conf	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs,
		bool	ones,
		bool	abs,
		unsigned int	addmode,
		unsigned int	addmask,
		unsigned int	cmpmode
	)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add   ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub   ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
  m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
  n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
  o[i] = 0.0 + n[i]
  if   cmpmode == fpcmp_nrm :
    ret[i] = o[i]
  elif cmpmode == fpcmp_lt :
    ret[i] = cmp[i] ? -n[i] : acc[i]
  elif cmpmode == fpcmp_ge :
    ret[i] = cmp[i] ? -n[i] : acc[i]

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for all lanes of second multiplicant.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant.
ones	If true second multiplicant is replaced with 1.0.
abs	If true the absolute value is taken before accumulation.
addmode	Select one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask	8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmode	Use "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.

v4cfloat fpmul_nc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_nc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_nc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_nc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_nc	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpmul_nc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpmul_nc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpneg_abs_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_abs_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpneg_abs_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_abs_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpneg_abs_mul	(	v8float	xbuf,
		v8float	zbuf
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpneg_abs_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_abs_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpneg_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpneg_mul	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v8float fpneg_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v8float fpneg_mul	(	v8float	xbuf,
		v8float	zbuf
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v8float fpneg_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	8 x 4 bits: Additional lane-dependent offset for Z.

v8float fpneg_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	8 x 4 bits: Additional lane-dependent offset for X.
zstart	Starting offset for second multiplicant for all lanes of X.
zoffs	8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.

v4cfloat fpneg_mul	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul	(	v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_c	(	v8float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_c	(	v16float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_c	(	v4float	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul_c	(	v32float	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X.
xoffs	4 x 4 bits: Additional lane-dependent offset for X.
zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_c	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul_c	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul_c	(	v4cfloat	xbuf,
		v8float	zbuf
	)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul_c	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v8float	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant.
zoffs	4 x 4 bits: Additional lane-dependent offset for Z.

v4cfloat fpneg_mul_cc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cc	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul_cc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul_cn	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_cn	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v4cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v8cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v4cfloat	xbuf,
		v4cfloat	zbuf
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
zbuf	Second multiplication input buffer.

v4cfloat fpneg_mul_nc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		v4cfloat	zbuf,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zbuf	Second multiplication input buffer.
zstart	Starting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

v4cfloat fpneg_mul_nc	(	v16cfloat	xbuf,
		int	xstart,
		unsigned int	xoffs,
		int	zstart,
		unsigned int	zoffs
	)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~

Note: This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.

Returns: Result vector.

Parameters

xbuf	First multiplication input buffer.
xstart	Starting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs	4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Parameters

zstart	Starting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs	4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).

Note: When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.

Overview

Functions

Function Documentation