AI Engine Intrinsics  (AIE) r2p21
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages

Overview

Advanced Float Vector Operations. This page contains the fully configurable fpmac_conf and some convenient wrappers to it. The lane selection scheme is explained after each intrinsic definition.

Some of this floating point operations can generate exceptions, for more information you can go here.

Functions

v8float fpabs_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v8float xbuf, v8float zbuf)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpabs_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and take absolute value for single precision real times real floating point vectors.
 
v8float fpmac (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v8float fpmac (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex floating point vectors.
 
v8float fpmac (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v8float fpmac (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v8float fpmac (v8float acc, v8float xbuf, v8float zbuf)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision real times complex floating point vectors.
 
v8float fpmac (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v8float fpmac (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
 Multiply and accumulate for single precision complex times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times real floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v4cfloat fpmac (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex floating point vectors.
 
v8float fpmac_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v8float xbuf, v8float zbuf)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v8float fpmac_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and accumulate for single precision real times real floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
 Multiply and accumulate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmac_c (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmac_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex conjugate times complex floating point vectors.
 
v8float fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v8float fpmac_conf (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_conf (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply and accumulate for single precision floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmac_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and accumulate for single precision complex times complex conjugate floating point vectors.
 
v8float fpmsc (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v8float fpmsc (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex floating point vectors.
 
v8float fpmsc (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v8float fpmsc (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v8float fpmsc (v8float acc, v8float xbuf, v8float zbuf)
 Multiply and subtract for single precision real times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision real times complex floating point vectors.
 
v8float fpmsc (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v8float fpmsc (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
 Multiply and subtract for single precision complex times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times real floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v4cfloat fpmsc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex floating point vectors.
 
v8float fpmsc_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v8float xbuf, v8float zbuf)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v8float fpmsc_abs (v8float acc, v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and subtract for single precision real times real floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v4float xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v4cfloat xbuf, v8float zbuf)
 Multiply and subtract for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmsc_c (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_cn (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v4cfloat xbuf, v4cfloat zbuf)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmsc_nc (v4cfloat acc, v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and subtract for single precision complex times complex conjugate floating point vectors.
 
v8float fpmul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v8float fpmul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v4cfloat fpmul (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex floating point vectors.
 
v8float fpmul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v8float fpmul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v8float fpmul (v8float xbuf, v8float zbuf)
 Multiply for single precision real times real floating point vectors.
 
v4cfloat fpmul (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex floating point vectors.
 
v4cfloat fpmul (v4float xbuf, v4cfloat zbuf)
 Multiply for single precision real times complex floating point vectors.
 
v8float fpmul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v8float fpmul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision real times real floating point vectors.
 
v4cfloat fpmul (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex floating point vectors.
 
v4cfloat fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times real floating point vectors.
 
v4cfloat fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times real floating point vectors.
 
v4cfloat fpmul (v4cfloat xbuf, v8float zbuf)
 Multiply for single precision complex times real floating point vectors.
 
v4cfloat fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v4cfloat xbuf, v4cfloat zbuf)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times real floating point vectors.
 
v4cfloat fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex floating point vectors.
 
v4cfloat fpmul_c (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmul_c (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmul_c (v4float xbuf, v4cfloat zbuf)
 Multiply for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmul_c (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpmul_c (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmul_c (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmul_c (v4cfloat xbuf, v8float zbuf)
 Multiply for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmul_c (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpmul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v4cfloat xbuf, v4cfloat zbuf)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpmul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v4cfloat xbuf, v4cfloat zbuf)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpmul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex conjugate times complex floating point vectors.
 
v8float fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v8float fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode, unsigned int &cmp)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_conf (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs, bool ones, bool abs, unsigned int addmode, unsigned int addmask, unsigned int cmpmode)
 Fully configurable multiply for single precision floating point vectors.
 
v4cfloat fpmul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v4cfloat xbuf, v4cfloat zbuf)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpmul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply for single precision complex times complex conjugate floating point vectors.
 
v8float fpneg_abs_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v8float xbuf, v8float zbuf)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_abs_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply, take absolute value and negate for single precision real times real floating point vectors.
 
v8float fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v8float fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v4cfloat fpneg_mul (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex floating point vectors.
 
v8float fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v8float fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v8float fpneg_mul (v8float xbuf, v8float zbuf)
 Multiply and negate for single precision real times real floating point vectors.
 
v4cfloat fpneg_mul (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex floating point vectors.
 
v4cfloat fpneg_mul (v4float xbuf, v4cfloat zbuf)
 Multiply and negate for single precision real times complex floating point vectors.
 
v8float fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v8float fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times real floating point vectors.
 
v4cfloat fpneg_mul (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex floating point vectors.
 
v4cfloat fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times real floating point vectors.
 
v4cfloat fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times real floating point vectors.
 
v4cfloat fpneg_mul (v4cfloat xbuf, v8float zbuf)
 Multiply and negate for single precision complex times real floating point vectors.
 
v4cfloat fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v4cfloat xbuf, v4cfloat zbuf)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times real floating point vectors.
 
v4cfloat fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex floating point vectors.
 
v4cfloat fpneg_mul_c (v8float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_c (v16float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_c (v4float xbuf, v4cfloat zbuf)
 Multiply and negate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_c (v32float xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision real times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_c (v4cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpneg_mul_c (v8cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpneg_mul_c (v4cfloat xbuf, v8float zbuf)
 Multiply and negate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpneg_mul_c (v16cfloat xbuf, int xstart, unsigned int xoffs, v8float zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times real floating point vectors.
 
v4cfloat fpneg_mul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v4cfloat xbuf, v4cfloat zbuf)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v4cfloat xbuf, v4cfloat zbuf)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_cn (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex conjugate times complex floating point vectors.
 
v4cfloat fpneg_mul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v4cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v8cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v4cfloat xbuf, v4cfloat zbuf)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, v4cfloat zbuf, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 
v4cfloat fpneg_mul_nc (v16cfloat xbuf, int xstart, unsigned int xoffs, int zstart, unsigned int zoffs)
 Multiply and negate for single precision complex times complex conjugate floating point vectors.
 

Function Documentation

v8float fpabs_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpabs_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpabs_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpabs_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpabs_mul ( v8float  xbuf,
v8float  zbuf 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpabs_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpabs_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and take absolute value for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmac ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmac ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmac ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmac ( v8float  acc,
v8float  xbuf,
v8float  zbuf 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpmac ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmac ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac ( v4cfloat  acc,
v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmac_abs ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac_abs ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmac_abs ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac_abs ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmac_abs ( v8float  acc,
v8float  xbuf,
v8float  zbuf 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpmac_abs ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmac_abs ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and accumulate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] + abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmac_c ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_c ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_c ( v4cfloat  acc,
v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac_c ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_c ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac_c ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac_c ( v4cfloat  acc,
v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac_c ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmac_cc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cc ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac_cc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac_cn ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_cn ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmac_conf ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmac_conf ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmac_conf ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmac_conf ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmac_conf ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmac_conf ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmac_conf ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmac_conf ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply and accumulate for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = acc[i] + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmac_nc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_nc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_nc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_nc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_nc ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmac_nc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmac_nc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and accumulate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] + xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmsc ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmsc ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmsc ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmsc ( v8float  acc,
v8float  xbuf,
v8float  zbuf 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpmsc ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmsc ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc ( v4cfloat  acc,
v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmsc_abs ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc_abs ( v8float  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmsc_abs ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc_abs ( v8float  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmsc_abs ( v8float  acc,
v8float  xbuf,
v8float  zbuf 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpmsc_abs ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmsc_abs ( v8float  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and subtract for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = acc[i] - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmsc_c ( v4cfloat  acc,
v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_c ( v4cfloat  acc,
v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_c ( v4cfloat  acc,
v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc_c ( v4cfloat  acc,
v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_c ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc_c ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc_c ( v4cfloat  acc,
v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc_c ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_cn ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmsc_nc ( v4cfloat  acc,
v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and subtract for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = acc[i] - xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
accIncoming accumulation vector.
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpmul ( v8float  xbuf,
v8float  zbuf 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v4float  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpmul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpmul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpmul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul ( v4cfloat  xbuf,
v8float  zbuf 
)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_c ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_c ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_c ( v4float  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul_c ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_c ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul_c ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul_c ( v4cfloat  xbuf,
v8float  zbuf 
)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul_c ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpmul_cc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cc ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul_cc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[i]) * zbuf[i]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul_cn ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_cn ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]]
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v8float fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v8float fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : zbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of second multiplicant. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode,
unsigned int &  cmp 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
cmp[i] = signbit(o[i])
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
cmp[i] = signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
cmp[i] = ~signbit(o[i])
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
cmp8 x 1 LSB bits: When using fpcmp_ge or fpcmp_lt in "cmpmode", it sets a bit if accumulator was chosen (per lane).
v4cfloat fpmul_conf ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs,
bool  ones,
bool  abs,
unsigned int  addmode,
unsigned int  addmask,
unsigned int  cmpmode 
)

Fully configurable multiply for single precision floating point vectors.

The output can be considered to always have 8 lanes beause each real and imaginary part of the complex float is handled as a separate lane.

if (addmode == fpadd_add ) neg = addmask ^ 0x00;
if (addmode == fpadd_sub ) neg = addmask ^ 0xFF;
if (addmode == fpadd_mixadd) neg = addmask ^ 0xAA;
if (addmode == fpadd_mixsub) neg = addmask ^ 0x55;
for (i = 0; i < 8; i++)
m[i] = xbuf[xstart + xoffs[i]] * (ones ? 1.0 : xbuf[zstart + zoffs[i]])
n[i] = (-1)^neg[i] * (abs ? |m[i]| : m[i])
o[i] = 0.0 + n[i]
if cmpmode == fpcmp_nrm :
ret[i] = o[i]
elif cmpmode == fpcmp_lt :
ret[i] = cmp[i] ? -n[i] : acc[i]
elif cmpmode == fpcmp_ge :
ret[i] = cmp[i] ? -n[i] : acc[i]
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for all lanes of second multiplicant.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant.
onesIf true second multiplicant is replaced with 1.0.
absIf true the absolute value is taken before accumulation.
addmodeSelect one of fpadd_add, fpadd_sub, fpadd_mixadd or fpadd_mixsub. This must be a compile time constant.
addmask8 x 1 LSB bits: Corresponding lane is negated if bit is set (depending on addmode).
cmpmodeUse "fpcmp_lt" to select the minimum between accumulation and multiplication result per lane, "fpcmp_ge" for the maximum and "fpcmp_nrm" for the usual sum.
v4cfloat fpmul_nc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_nc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_nc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_nc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_nc ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[i] * conj(zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpmul_nc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpmul_nc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpneg_abs_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_abs_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpneg_abs_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_abs_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpneg_abs_mul ( v8float  xbuf,
v8float  zbuf 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpneg_abs_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_abs_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply, take absolute value and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - abs(xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpneg_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpneg_mul ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v8float fpneg_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v8float fpneg_mul ( v8float  xbuf,
v8float  zbuf 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v8float fpneg_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 8; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs8 x 4 bits: Additional lane-dependent offset for Z.
v8float fpneg_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 8 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs8 x 4 bits: Additional lane-dependent offset for X.
zstartStarting offset for second multiplicant for all lanes of X.
zoffs8 x 4 bits: Additional lane-dependent offset for second multiplicant in X.
v4cfloat fpneg_mul ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul ( v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_c ( v8float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_c ( v16float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_c ( v4float  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul_c ( v32float  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision real times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X.
xoffs4 x 4 bits: Additional lane-dependent offset for X.
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_c ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul_c ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul_c ( v4cfloat  xbuf,
v8float  zbuf 
)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul_c ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v8float  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times real floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant.
zoffs4 x 4 bits: Additional lane-dependent offset for Z.
v4cfloat fpneg_mul_cc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cc ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul_cc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[i]) * zbuf[i])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul_cn ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * zbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_cn ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex conjugate times complex floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (conj(xbuf[xstart + xoffs[i]]) * xbuf[zstart + zoffs[i]])
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v4cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v8cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v4cfloat  xbuf,
v4cfloat  zbuf 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[i] * conj(zbuf[i]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
zbufSecond multiplication input buffer.
v4cfloat fpneg_mul_nc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
v4cfloat  zbuf,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0; i < 4; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(zbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zbufSecond multiplication input buffer.
zstartStarting offset for all lanes of Z. This must be a compile time constant. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for Z. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
v4cfloat fpneg_mul_nc ( v16cfloat  xbuf,
int  xstart,
unsigned int  xoffs,
int  zstart,
unsigned int  zoffs 
)

Multiply and negate for single precision complex times complex conjugate floating point vectors.

 ~~~~~~~~~~~~~~~~~~~
 for (i = 0 ; i < 4 ; i++)
   ret[i] = - (xbuf[xstart + xoffs[i]] * conj(xbuf[zstart + zoffs[i]]))
 ~~~~~~~~~~~~~~~~~~~
Note
This is a two-cycle intrinsic which will result in two microcode instructions to be scheduled. The same behavior can be achieved by calling the fully configurable multiply-accumulate intrinsic twice in order to add the two terms of the complex multiplication.
Returns
Result vector.
Parameters
xbufFirst multiplication input buffer.
xstartStarting offset for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
xoffs4 x 4 bits: Additional lane-dependent offset for X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7]
 The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When xoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.
Parameters
zstartStarting offset for second multiplicant for all lanes of X. The start value refers to complex lanes (lane 0 corresponds to the first real and imaginary value).
zoffs4 x 4 bits: Additional lane-dependent offset for second multiplicant in X. Highest (4th) bit in each lane must be 0. Range per lane : [0,7] The offsets are referring to complex lanes (lane 0 corresponds to the first real and imaginary value).
Note
When zoffs is a runtime parameter, it might be more efficient to use fpmac_conf and calculate the offsets manually.