AR# 69446

Zynq UltraScale+ MPSoC Example Design - Use AXI HPC port to perform coherent transfers


This example design demonstrates how to use the AXI HPC0 to perform a coherent transfer.

The HPC0 and HPC1 ports on Zynq UltraScale+ enable I/O coherent masters in the PL to snoop the APU caches. 

The HPC ports are preferable to the ACP port in most applications as they provide higher bandwidth and do not disturb the contents of the processor L2 cache.

Implementing a PL master that uses coherent transfers requires consideration of the system on the hardware and software sides.

Note: An Example Design is an answer record that provides technical tips to test a specific functionality. A tip can be a snippet of code, a snapshot, a diagram or a full design implemented with a specific version of the Xilinx tools. 

It is up to the user to "update" these tips to future Xilinx tool releases and to "modify" the Example Design to fulfill their needs. Limited support is provided by Xilinx on these Example Designs.



The PL master must drive AxCACHE and AxPROT lines to appropriate values to enable cache snooping. 

There are two considerations in the settings of these fields: MMU setting for the snooped memory region, and the security level of the memory region.

AxCACHE should indicate that the memory address is a cacheable memory type. The following types have been tested and verified to generate coherent transfers:

ARCACHE[3:0]AWCACHE[3:0]Memory Type
10111111Write-back Write-allocate
11111111Write-back Read and Write-allocate

AxPROT[1] needs to match the security setting for the memory region. This is a function of the processor exception level. 

In a bare metal application, the processor operates in EL3 which generates secure memory accesses.

To snoop the processor caches, the PL master must also use secure transactions so AxPROT[1] should be set to 0. Different exception levels might require a different setting.


In a bare metal application, two things must happen to allow the HPC ports to snoop the APU cache. 

The first is that the snooped memory region (typically DRAM) must be set to outer shareable. This is accomplished by modifying translation_table.S in the application BSP. See line 96 in the attached file:

.set Memory, 0x405 | (2 << 8) | (0x0)

Refer to section D4.3 of the ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile (DD10487A) for the details of how the translation table maps memory attributes to memory regions.

The second is that the Cache Coherent Interconnect (CCI) port to the APU caches must enable snooping. 

The bare metal application handles this by writing 0x1 to CCI register 0xFD6E4000. See line 167 of the attached file xaxicdma_example_simple_poll.c.

Example Design:

Step-by-Step Instructions:

  1. Extract and open the archived Vivado design, generate bitstream, export hardware with bitstream, and launch SDK.
  2. In SDK, create an Empty Application example.
  3. Import the included C code.
  4. Verify that translation_table.S in the application BSP sets the DRAM memory region to outer shareable.
  5. Program the PL using the bitstream generated in step 1.
  6. Run the application.

Important Notes:

The attached Vivado design and source code shows how to implement coherent transfers in a bare metal design using the AXI CDMA IP in simple mode. 

An AXI GPIO block is included to provide software control of the AxPROT and AxCACHE lines. The design was created in 2017.1.

Note that the source code in the DoSimplePollTransfer function does not use cache flushes to ensure that the CDMA reads the data that the processor writes to the SrcBuffer memory. 

The CCI snoops the processor caches and returns the updated value of SrcBuffer.


文件名 文件大小 File Type 42 MB ZIP
xaxicdma_example_simple_poll.c 12 KB C
translation_table.S 9 KB S
AR# 69446
日期 10/23/2017
状态 Active
Type 综合文章