Building the Adaptable, Intelligent World

Ivo Bolsens

Senior Vice President & Chief Technology Officer



Mountains of Unstructured Data

One Architecture Can't Do It Alone This is the Era of Heterogeneous Compute



# Today's Developer Needs

Performance for a Diverse Range of Applications

**Software Programmability** 

Adaptability to Keep Pace with Rapid Innovation



Software Programmability

## Enter the ACAP

A New Class of Devices for Today's Challenges



**Device Category** 



# **Compute Acceleration**





### The Industry's First ACAP

Heterogeneous Parallel

Scalable SW Programmable

**HW Adaptable** 





### Network-on-Chip (NoC)

#### Ease of Use

Inherently Software Programmable

Available at Boot, No Place-and-Route Required

#### High Bandwidth and Low Latency

Multi-Terabit/Sec Throughput
Guaranteed QoS

#### **Power Efficiency**

8x Power Efficiency vs. Soft Implementations
Arbitration Across Heterogeneous Engines



# Platform for Any Developer

**User Application** C, C++, Python Adaptable for **Any Application** Application-Specific Frameworks Machine Learning | Video | Genomics | Search | Financial Modeling | Database New Unified Software Development Environment Software Programmable Xilinx & Ecosystem OS & Embedded Run-Time **Custom HW** C, Xilinx Libraries **HW Libraries Scalar Engines Adaptable Engines Intelligent Engines** Heterogeneous Platform

**VERSAL** 

## > Versal Multi-Market Platform







#### **VERSAL AI Core Series**

# For 5G Beamforming & CloudRAN

Al Engines Provides >5X Compute Density for Advanced Wireless Compute

### Versal Multi-Market Platform





AI ADOPTION ACROSS MARKETS

# Projected Growth in Al Inference





Barclays Research, Company Reports May 2018

# Challenges



The Rate of Al Innovation



Performance at Low Latency



**Low Power Consumption** 



Whole App Acceleration



### The Rate of Al Model Innovation

Classification Detection Segmentation Recognition Engine Detection

CNN Speech Recommendation Engine Detection

Speech Recommendation Engine Detection

Anomaly Detection

MLP

DIVERSE MODELS OVER A BROAD RANGE OF APPLICATIONS

### The Rate of Al Model Innovation

#### Classification

Classification





https://arxiv.org/pdf/1605.07678.pdf https://arxiv.org/pdf/1608.06993.pdf https://arxiv.org/pdf/1709.01507.pdf https://arxiv.org/pdf/1611.05431.pdf

# Rate of Innovation Outpaces Silicon Cycles

#### AlexNet



#### GoogLeNet



DenseNet





Silicon Design Cycle (time)

**Production Design** 





# Low Latency is Critical for Inference



High Throughput OR Low Latency

High Throughput AND Low Latency

## Inference Moving to Lower Precision



#### **RELATIVE ENERGY COST**

| Operation: | Energy (pJ) |
|------------|-------------|
| 8b Add     | 0.03        |
| 16b Add    | 0.05        |
| 32b Add    | 0.1         |
| 16b FP Add | 0.4         |
| 32b FP Add | 0.9         |

### Reduced Precision Arithmetic





# Need for Adaptable Hardware

**Custom Data Flow Domain Specific Architectures Custom Memory Hierarchy** (DSAs) on Adaptable Platforms **Custom Precision** 

## Low Latency

### Xilinx's Unique Advantage

#### **Latency Tolerant Inference**



### Al Inference Acceleration

**Leveraging AI Engines** 

Majority of Adaptable & Scalar Engines Available for Whole App Acceleration

<sup>(1)</sup> Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe

<sup>(2)</sup> V100 results taken from Oct 9th updates on www.Nvidia.com

<sup>(3)</sup> Versal Core Series

<sup>(4)</sup> GoogLeNet V1 throughput (Img/sec)

# Low Latency

### Xilinx's Unique Advantage

Sub – 7ms Latency



### Al Inference Acceleration

**Leveraging AI Engines** 

Majority of Adaptable & Scalar Engines
Available for Whole App Acceleration

<sup>(1)</sup> Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe

<sup>(2)</sup> V100 results taken from Oct 9th updates on www.Nvidia.com

<sup>(3)</sup> Versal Core Series

<sup>(4)</sup> GoogLeNet V1 throughput (Img/sec)

# Low Latency

### Xilinx's Unique Advantage

Sub – 2ms Latency



### Al Inference Acceleration

**Leveraging AI Engines** 

Majority of Adaptable & Scalar Engines Available for Whole App Acceleration

<sup>(1)</sup> Measured on EC2 Xeon Platinum 8124 Skylake, c5.18xlarge AWS instance, Intel Caffe: https://github.com/intel/caffe

<sup>(2)</sup> V100 results taken from Oct 9th updates on <u>www.Nvidia.com</u>

<sup>(3)</sup> Versal Core Series

<sup>(4)</sup> GoogLeNet V1 throughput (Img/sec)

## Whole Application Acceleration

### **Intelligent Video Analytics**



# Enabling the Development Community



# Platforms for Every Developer

| Data Scientists                       | Frameworks: Python, APIs                                | DEEPHi<br>m s a a | Caffe         | mxnet                                 | <b>∅</b> FFMPEG | <b>†</b> TensorFlow |
|---------------------------------------|---------------------------------------------------------|-------------------|---------------|---------------------------------------|-----------------|---------------------|
| SaaS Developers                       | FaaS Platform                                           | aws               | HUAWEI        | <b>Aliyun</b> Alibaba Cloud Consuling | XIEMIN          |                     |
| Application Developers                | SDX: C++, OpenCL, Libraries,<br>XRT open source runtime | Linux             | <u> </u> RTOS | <b>⊗</b> en ∙                         |                 |                     |
| Embedded Developers                   | MPSoC Software Environment                              |                   |               |                                       |                 |                     |
| Hardware-Aware<br>Software Developers | HLS: C++ IP Functions                                   |                   |               |                                       |                 |                     |
| System Integrators                    | IP Integrator                                           |                   |               |                                       |                 |                     |
| Hardware Developers                   | Vivado Design Suite: RTL Full Design                    |                   |               |                                       |                 |                     |







Faster than CPUs & GPUs
Latency Advantage Over GPUs



Optimized for Any Workload Adapt to Changing Algorithms



Deploy in the Cloud or On-Premises Applications Available Now









| TOOLS                              | USER                                 |  |  |  |
|------------------------------------|--------------------------------------|--|--|--|
| Frameworks                         | Data Scientists<br>and AI Developers |  |  |  |
| Libraries, Compilers, Middleware   | Application<br>Developers            |  |  |  |
| Firmware and Runtime               | Software<br>Developers               |  |  |  |
| Integrated Development Environment | Hardware and<br>Software Developers  |  |  |  |

# Accessible: Cloud & On-Premise



Building the Adaptable, Intelligent World

