Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Memory throughput is one of the major bottlenecks for accelerator performance. Now that Zynq UltraScale+ systems are being deployed at exascale to edge, it is important to understand its limitations and optimizations possible for developers. In this paper, we extensively evaluate the memory performance and behaviour for various AXI ports combinations, burst sizes, access patterns, and the number of accelerators per AXI port. Our results on ZCU102 and Ultra 96 boards show that 1) effective throughput of these systems is only 75% and 92.5% of theoretical maximum respectively, 2) 128 and 192 byte burst size is often optimal, 3) AXI ports of the same type may not always exhibit similar behaviour, 4) multiplexing accelerators in PL can provide better throughput distribution compared to multiplexing in PS, and 5) using all AXI ports does not lead to the highest performance.

Bibliographical metadata

Original languageEnglish
Title of host publicationInternational Conference on Field-Programmable Technology (FPT)
Publication statusAccepted/In press - 7 Oct 2019