DSP Solutions

Overview

With their inherent flexibility, AMD adaptive SoCs and FPGAs are ideal for high-performance or multi-channel digital signal processing (DSP) applications that can take advantage of hardware parallelism. AMD adaptive SoCs and FPGAs combine this processing bandwidth with comprehensive solutions, including easy-to-use design tools for hardware designers, software developers, and system architects.

Hardware Parallelism

A standard Von Neumann DSP architecture requires 256 cycles to complete a 256-tap FIR filter, while adaptive SoCs and FPGAs can achieve the same result in a single clock cycle.

This massive parallelism translates into exceptional levels of DSP performance:

49.5 TeraMACs of fixed-point performance (8-bit)
23.1TeraFLOPs for single-precision floating point

Comprehensive DSP Solutions

AMD DSP solutions include silicon, IP, reference designs, development boards, tools, documentation, and training to enable a wide range of applications in a breadth of markets, including —but not limited to— wireless communications, data center, and aerospace and defense.

Comprehensive Development Flows

Various tool flows are available for different use models and different levels of design abstraction:

Hardware designers can design in:

RTL and system level design accomplished with the Vivado™ Design Suite
C/C++
MATLAB® and Simulink® environment using Vitis™ Model Composer

Software developers accustomed to developing in C/C++ can design using:

System architects can rapidly evaluate new algorithms with:

Vitis Model Composer for system modeling in the MATLAB or Simulink environment
Vitis HLS for algorithmic exploration in C or C++

Choose Your Solution

With AMD adaptive SoCs and FPGAs, designers can use multiple flows to deploy their DSP applications depending on design approach and level of abstraction.

Based on an ASIC-class architecture, AMD adaptive SoCs and FPGAs combine multi-hundred giga-bit-per-second I/O bandwidth with over 49 TeraMACs of fixed point DSP performance in the Versal™ Premium series. The AMD DSP slice and its parallelism is key to the achievable DSP performance in the latest generation of AMD FPGAs.

DSP Slice Architecture

The DSP58 in Versal devices slice is the 6th generation of DSP slices in AMD architectures.

This dedicated DSP processing block is implemented in full custom silicon that delivers leading power/performance allowing efficient implementations of popular DSP functions, such as a multiply-accumulator (MACC), multiply-adder (MADD) or complex multiply.

The slice also provides capabilities to perform different kinds of logic operations, such as AND, OR and XOR operations.

The Versal device DSP58 architecture builds on the success of the UltraScale™ FPGA DSP48E2 with further enhancements:

Wider multiplier (27 x 24 bits)
Single-precision, floating-point multiplier
18x18 complex multiplication using two back-to-back DSPs
INT8 vector dot product mode

These enhancements help DSP critical applications perform more computation within the DSP48E2 slice before going into the FPGA fabric, ultimately leading to both resource and power savings.

DSP48E2 (UltraScale) vs DSP58 (Versal) Slice Features

Function	UltraScale	Versal
DSP Tile/Slice Type	DSP48E2	DSP58
Multiple Add/Sub/Acc operations
Multiplier and MACC	27x18	27x24
Squaring: [(A or B) +/- D]2
WMUX Feedback Ultra Efficient Complex Multiply CMACC	3 x DSP48E2	2 x DSP58
SIMD Support
Integrated Pattern Detect Circuitry
Integrated Logic Unit
Wide Mux Functions	48-bit	58-bit
Wide XOR	96-bit	116-bit
Single Precision Floating Point Multiplier
Optional 96-bit Output
Cascade Routing
Pipeline Registers
D Pre-adder
Sequential Complex Multiply, AB dyn access
AB Register Pipeline Balancing Improved

Tools and Flows

Depending on your designing preferences, AMD has tools supporting RTL, C/C++ and model-based design entry. This flexibility in the design flow, along with an extensive DSP IP catalog, facilitates easier adoption of AMD tools and devices.

Visit Tools, Libraries & Frameworks for more information.

DSP Performance Metrics

The following table shows some of the key DSP performance metrics for 7 Series, UltraScale™ and UltraScale+™ families. For adaptive SoC device performance, see Software Developer section.

	Kintex UltraScale	Kintex UltraScale+	Virtex UltraScale	Virtex UltraScale+	Versal AI Core	Versal AI Edge	Versal AI Prime	Versal AI Premium
System Logic Elements (K)	318–1,451	356–1,143	783–5,541	862–3,780	540 - 1,968	44 - 1,139	329 - 2,233	833 - 7,352
DSP Slices	768–5,520	1,368–3,528	600–2,880	2,280–12,288	928 - 1,968	90 - 1,312	464 - 3,984	1,140 - 14,352
27x18 Multipliers	768–5,520	1,368–3,528	600–2,880	2,280–12,288	928 - 1,968	90 - 1,312	464 - 3,984	1,140 - 14,352
INT8 GOPs¹	1,774–14,315	4,263–11,000	1,554–7,469	7,108–38,318	6,403 - 13,579	62 - 9,052	3,201 - 27,489	7,866 - 99,029
INT16 GOPs	1,014–8,180	2,436–6,286	888–4,268	4,062–21,896	2,134 - 4,526	21 - 3,017	1,067 - 9,163	2,622 - 33,010
Complex INT18 GOPs	676 - 5,453	1,624 - 4,191	592 - 2,845	2708 - 14,597	913 - 1,937	8 - 1,291	456 - 3,920	1,122 - 14,122
Single Precision Floating Point (GFLOPs)²	320–2,685	800–1,673	294–1,411	1,354–7,299	1,494 - 3,168	14 - 2112	747 - 6,414	1,835 - 23,107

We have introduced software development environments and a comprehensive set of familiar and powerful tools, libraries and methodologies which allow software developers to target AMD adaptive SoCs and FPGAs with ease. With high level abstraction environment Vitis™ unified software platform. We can offer GPU-like and familiar embedded application development and runtime experiences for C, C++ and/or OpenCL development.

AMD MPSoCs and Versal Devices

The Zynq™ UltraScale+™ MPSoC and the Versal architecture combine a powerful processing system (PS), incorporating Arm® Cortex® processors, and user-programmable logic (PL), in a single device.

Application Profiling for Acceleration

The Vitis unified software platform provides the ability to profile a given application and allows for the creation of hardware accelerators to run more efficiently in the Programmable Logic (PL), where the flexibility and parallelism of the FPGA are leveraged to provide large performance improvements. This also enables other functions of the application to run in the Processing System (PS) in parallel if desired.

By targeting AMD adaptive SoCs and FPGAs, many DSP and embedded applications will see improvements in efficiency and reduced power for their applications.

Features and DSP Performance of AMD SoC Devices

The following tables show some of the key features and DSP performance metrics for both AMD Zynq UltraScale+ MPSoC families and Versal™ devices. For non-SoC device performance, visit the Hardware Designer section.

Processing System	Zynq 7000 SoC	Zynq UltraScale+ MPSoC
Application Processing Unit (APU)	Single/Dual-core ARM Cortex-A9 MPCore™ up to 1GHz ARMv7-A architecture NEON™ media-processing engine Single and double precision Vector Floating Point Unit (VFPU)	Dual/Quad-core ARM Cortex-A53 MPCore up to 1.5GHz ARMv8-A Architecture Neon Advanced SIMD media processing engine Single/Double Precision Floating Point Unit (FPU)
Real-Time Processing Unit (RPU)	-	Dual-core ARM Cortex-R5 MPCore up to 600MHz ARMv7-R Architecture Single/Double Precision Floating Point Unit (FPU)
Multimedia Processing	-	GPU ARM Mali™-400 MP2 up to 667MHz OpenGL ES 1.1 and 2.0 support OpenVG 1.1 support Video Codec supporting H.264-H.265 (EV devices only)
Dynamic Memory Interface	DDR3, DDR3L, DDR2, LPDDR2	DDR4, LPDDR4, DDR3, DDR3L, LPDDR3
High-Speed Peripherals	USB 2.0, Gigabit Ethernet, SD/SDIO	PCIe® Gen2, USB3.0, SATA 3.1, DisplayPort, Gigabit Ethernet, SD/SDIO
Security	RSA, AES, and SHA, ARM TrustZone®	RSA, AES, and SHA, ARM TrustZone
Max I/O Pins	128	214

Programmable Logic	Zynq 7000 SoC	Zynq UltraScale+ MPSoC
System Logic Elements (K)	23–444	103–1,045
Max Memory (Mb)	1.8–26.5	5.3–70.6
Max I/O Pins	100–362	252–668
DSP Slices	60–2,020	240–3,528
18x18 Multipliers	60–2,020	240–3,528
Fixed Point Performance (GMACs) (1)	42–1,313	213–3,143
Fixed Point Performance For Symmetric Filters (GMACs) (1) (2)	84–2,626	426–6,286
INT8 GOPs (1) (3)	84–2,626	745–11,000
INT16 GOPs (1)	84–2,626	426–6,286
Single Precision Floating Point (GFLOPs) (1) (4)	23–716	142–1,673
Single Precision Floating Point (GFLOPs) (1) (5)	17–537	106–1,571
Half Precision Floating Point (GFLOPs) (1) (6)	34–1,074	212–3,142

Notes:

All performance calculations based of -2 speed grade parts for Zynq 7000 adaptive SoC and -3 for Zynq UltraScale+ MPSoC
Using the pre-adder DSP performance can be increased 2x for symmetric filters
Please refer to WP486 – Deep Learning with INT8 Optimization on AMD Devices (Not applicable for Zynq devices)
Single Precision Floating Point performance using Floating Point Operator core with 3 DSP slices
Single Precision Floating Point performance using Floating Point Operator core with 4 DSP slices
Half Precision Floating Point performance using Floating Point Operator core with 2 DSP slices

To learn more about AMD adaptive SoCs and MPSoCs, go to:

AMD adaptive SoCs & MPSoCs

DSP in the Processing Subsystem

The Processing System (PS) provides DSP processing capabilities by way of the different ARM processing cores.

For more information on DSP capabilities in the ARM processors, visit:

Cortex-A Series Family
SIMD and Advanced SIMD (NEON) technologies
ARM Floating Point Architecture

Some useful examples can be found at the following locations:

For Zynq UltraScale+ MPSoC, see UG1211 for a demonstration of an FFT using the ARM NEON instruction set.

For Zynq 7000 SoC, the following Tech Tips are available on Xilinx wiki when targeting the Cortex-A9 and ARM SIMD:

AMD Data-type Support

AMD has very flexible data-type support in their devices. Varying precisions of Fixed Point, Floating Point and Integer are supported natively in AMD tools with Floating Point being implemented with the aid of the Floating Point Operator IP core.

Floating Point designs implemented on FPGAs will always lead to higher resource and power usage compared to Fixed Point or Integer implementations. Converting to a fixed point solution where possible will bring large benefits:

Fewer FPGA resources
Lower power
Lower cost

For more details on the benefits of converting from floating point to fixed point data types, please read WP491.

Benchmarks

The below tables show a small selection of algorithms and possible performance improvements by using an AMD device and in particular the fabric in the programmable logic (PL) to accelerate the design.

Algorithm	CPU/GPU	Zynq UltraScale+ MPSoC	Advantage
Stereo LocalBM @ 2K	ARM: 0.5 FPS/Watt nVidia: 3.5 FPS/Watt	146 FPS/Watt	292x 42x
Optical Flow (Lucas-Kanade)	ARM: 0.1 FPS/Watt nVidia: 0.8 FPS/Watt	7.1 FPS/Watt	9.3x
GoogleNet (Batch=1)	ARM: 0.1 Imgs/s/w nVidia: 8.8 Imgs/s/w	53 Imgs/s/w	530x 6x

Notes:

ARM: Quad-core A53 run on Raspberry Pi @ 1200MHz
Nvidia benchmarks were done using Tegra X1
Optical Flow (LK) – Window Size 11x1

Algorithm	CPU/DSP	Zynq 7000	Advantage
Forward Projection	ARM: 3 sec/view	0.016 sec/view	188x
Motion Detection	ARM: 0.7 FPS	67 FPS	90x
Noise Reduction-Sobel	ARM: 1 FPS	67 FPS	60x
Canny Edge Detection	ARM: 0.66 FPS	40 FPS	45x
3D Image Reconstruction	ARM: 75k	8k	9x
DPD	ARM: 506 ms	31.3 ms	16x
FIR	TI DSP: 64020 ns	1200 ns	53x
FFT	TI DSP: 1036 ns	128 ns	8x

Notes:

Cortex-A9 core used only on the Zynq devices when targeting ARM
TI benchmarks were done using C66 DSP core

AMD high-level design tools like Vitis Model Composer for DSP and High Level Synthesis provide a level of abstraction that empower system architects and domain experts to rapidly evaluate new algorithms and focus on developing the differentiating parts of their design. The complete AMD DSP solution is a combination of these design tools, IP, reference designs, methodologies and boards that work together to get to a working production design in the shortest time possible.

The Vitis Model Composer is a Model-Based design tool that leverages the MATLAB and Simulink environment to define, test and implement production quality DSP algorithms in programmable logic in a fraction of traditional RTL development times.

The tool provides:

100+ optimized DSP blocks, many with C simulation models for 2-3X faster simulation vs RTL
Integration of RTL, IP, Simulink, MATLAB and C/C++ components of a DSP system
Bit and cycle accurate floating and fixed-point simulations
Hardware co-simulation to accelerate simulation and validate algorithm on working hardware
Automatic code generation from Simulink to packaged IP or low-level HDL
Automatic generation of HDL test bench, including test vectors

Learn more about Vivado System Generator for DSP:

Introduction to Vitis Model Composer (Video)
Vitis Model Composer (Documentation)

High Level Synthesis

High-Level Synthesis, include Vitis unified software platform, enables portable C, C++ and System C algorithm specifications to be directly targeted into AMD FPGA & Adaptive SoCs without the need to create RTL. Just as there are compilers from C/C++ to different processor architectures, the HLS compiler provides the same functionality from C/C++ to AMD FPGA & Adaptive SoCs.

Learn more about Vivado High Level Synthesis:

Getting Started with Vitis High-Level Synthesis (Video)
Vitis High Level Synthesis User Guide (Documentation)

Tools & Ecosystem

AMD provides best-in-class tools to enable Digital Signal Processing (DSP) applications to be implemented efficiently and at low power on AMD adaptive SoCs and FPGAs. Whether you are designing with RTL, C/C++/SystemC, or Matlab/Simulink, the AMD tools below can easily facilitate your DSP design and reduce your time-to-market.

Libraries and Frameworks

AMD offers a range of libraries that are optimized for performance, resource utilization and ease of use.

Libraries & Frameworks	Description	Application
GitHub Repositories	AMD has created GitHub repositories, which contain useful examples for many applications including DSP-related functions.	Vitis Vitis Model Composer Vitis Acceleration
Vitis Accelerated Libraries	AMD has created an extensive set of open-source, performance-optimized libraries that offer out-of-the-box acceleration with minimal to zero-code changes to your existing applications.	Vitis Libraries

Libraries & Frameworks

Description

Application

GitHub Repositories

AMD has created GitHub repositories, which contain useful examples for many applications including DSP-related functions.

Vitis

Vitis Model Composer

Vitis Acceleration

Vitis Accelerated Libraries

AMD has created an extensive set of open-source, performance-optimized libraries that offer out-of-the-box acceleration with minimal to zero-code changes to your existing applications.

Vitis Libraries

Partners, Boards & Kits

AMD and its partners work together to produce tools and boards to ease the adoption of AMD FPGAs and SoCs for DSP applications across many market segments.

Partner	Description	Solution
Avnet DSP-Centric Development Kits and Modules	MathWorks and leading high-speed analog supplier Avnet offer, DSP-centric development kits and production-ready system-on-modules (SOM) for embedded vision, software-defined radio, and high-performance motor control.	Avnet
Mathworks Computing Software	Mathworks MATLAB® and Simulink® can reduce adaptive SoCs and FPGA system development time significantly by enabling users to: Create complex signal and image processing, communications, and control algorithms Validate system requirements early in the development process Generate and verify HDL and C code targeting AMD FPGA and SoC	Mathworks
Analog Devices Add-On Boards	The AD-FMCDAQ2-EBZ FMC board is a self-contained data acquisition and signal synthesis prototyping platform supporting ease of use operation enabling quicker end-system signal processing development. AD9680 features a 14-bit, 1.0 GSPS, JESD204B ADC AD9144 features a quad, 16-bit, 2.8 GSPS, JESD204B DAC AD9523-1 is driven by a 14-output, 1 GHz clock	Analog Devices

Partner

Description

Solution

Avnet DSP-Centric Development Kits and Modules

MathWorks and leading high-speed analog supplier Avnet offer, DSP-centric development kits and production-ready system-on-modules (SOM) for embedded vision, software-defined radio, and high-performance motor control.

Avnet

Mathworks Computing Software

Mathworks MATLAB® and Simulink® can reduce adaptive SoCs and FPGA system development time significantly by enabling users to:

Create complex signal and image processing, communications, and control algorithms
Validate system requirements early in the development process
Generate and verify HDL and C code targeting AMD FPGA and SoC

Mathworks

Analog Devices Add-On Boards

The AD-FMCDAQ2-EBZ FMC board is a self-contained data acquisition and signal synthesis prototyping platform supporting ease of use operation enabling quicker end-system signal processing development.

AD9680 features a 14-bit, 1.0 GSPS, JESD204B ADC
AD9144 features a quad, 16-bit, 2.8 GSPS, JESD204B DAC
AD9523-1 is driven by a 14-output, 1 GHz clock

Analog Devices

Resources

Deep Learning with INT8 Optimization on AMD Devices

AMD integrated DSP architecture can achieve 1.75X solution-level performance at INT8 deep learning operations than other FPGA DSP architectures.

View White Paper

Multi-Channel Fractional SRC Filter in HLS

This application note focuses on the design of a multi-channel fractional sample rate conversion (SRC) filter using the Vivado tool, which takes the source code in C++ programming language and generates highly efficient synthesizable Verilog or VHDL code for FPGA.

View Application Note

Stay Informed

Join the adaptive SoC and FPGA notification list to receive the latest news and updates.

附注

Please refer to WP486 – Deep Learning with INT8 Optimization on AMD Devices
Single Precision Floating Point performance using Floating Point Operator core with 3 DSP slices in Ultrascale+

服务器

商用系统

工作站

嵌入式产品

个人笔记本电脑

个人台式机

手持设备

资源

加速器

自适应加速器

DPU 加速器

以太网适配器

工作站

台式机

笔记本电脑

资源

自适应 SoC 和 FPGA

模块化系统 (SOM)

技术

开发者资源

评估板与套件

处理器工具

显卡工具和应用

自适应 SoC 和 FPGA

IP 与应用

GPU 加速器工具和应用

概要

面向数据中心和云计算

面向边缘计算和终端

面向开发人员

行业

行业

行业

Industrias

游戏

系统

技术

资源

EPYC（霄龙）处理器

Radeon 显卡与 AMD 芯片组

FPGA 和自适应 SoC

Alveo 加速器和 Kria SOM

锐龙处理器

以太网适配器

概要

处理器

加速器

自适应 SoC、FPGA 和 SOM

显卡

概要

资源按市场领域

资源按产品

资源按类型

关于我们的合作伙伴

处理器与显卡

DPU 加速器

FPGA 与自适应 SoC

选择我们的零售合作伙伴

自适应和嵌入式计算

Get AMD Fan Gear

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Your cart is empty

Overview

Choose Your Solution

Hardware Designer

DSP Slice Architecture

DSP48E2 (UltraScale) vs DSP58 (Versal) Slice Features

Tools and Flows

DSP Performance Metrics

Software Developer

AMD MPSoCs and Versal Devices

Application Profiling for Acceleration

Features and DSP Performance of AMD SoC Devices

DSP in the Processing Subsystem