Enabling Shared Memory Communication in Networks of MPSoCsCitation formats

  • External authors:
  • Joshua Lant
  • Caroline Concatto
  • Andrew Attwood
  • Jose Pascual Saiz
  • Mike Ashworth
  • Javier Navaridas
  • Mikel Luján

Standard

Enabling Shared Memory Communication in Networks of MPSoCs. / Lant, Joshua; Concatto, Caroline; Attwood, Andrew; Pascual Saiz, Jose; Ashworth, Mike; Navaridas, Javier; Luján, Mikel; Goodacre, Anthony.

In: Concurrency and Computation: Practice and Experience, 2018.

Research output: Contribution to journalArticlepeer-review

Harvard

Lant, J, Concatto, C, Attwood, A, Pascual Saiz, J, Ashworth, M, Navaridas, J, Luján, M & Goodacre, A 2018, 'Enabling Shared Memory Communication in Networks of MPSoCs', Concurrency and Computation: Practice and Experience. https://doi.org/10.1002/cpe.4774

APA

Lant, J., Concatto, C., Attwood, A., Pascual Saiz, J., Ashworth, M., Navaridas, J., Luján, M., & Goodacre, A. (2018). Enabling Shared Memory Communication in Networks of MPSoCs. Concurrency and Computation: Practice and Experience. https://doi.org/10.1002/cpe.4774

Vancouver

Lant J, Concatto C, Attwood A, Pascual Saiz J, Ashworth M, Navaridas J et al. Enabling Shared Memory Communication in Networks of MPSoCs. Concurrency and Computation: Practice and Experience. 2018. https://doi.org/10.1002/cpe.4774

Author

Lant, Joshua ; Concatto, Caroline ; Attwood, Andrew ; Pascual Saiz, Jose ; Ashworth, Mike ; Navaridas, Javier ; Luján, Mikel ; Goodacre, Anthony. / Enabling Shared Memory Communication in Networks of MPSoCs. In: Concurrency and Computation: Practice and Experience. 2018.

Bibtex

@article{e3fbb8c106d5477b82ed6a7de27a0e62,
title = "Enabling Shared Memory Communication in Networks of MPSoCs",
abstract = "Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi‐Processor System‐on‐Chip), combining multiple hard‐core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting‐edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state‐of‐the‐art MPSoC, and the challenges to be overcome given the device's limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production‐ready.",
author = "Joshua Lant and Caroline Concatto and Andrew Attwood and {Pascual Saiz}, Jose and Mike Ashworth and Javier Navaridas and Mikel Luj{\'a}n and Anthony Goodacre",
year = "2018",
doi = "10.1002/cpe.4774",
language = "English",
journal = "Concurrency and Computation: Practice & Experience",
issn = "1532-0626",
publisher = "John Wiley & Sons Ltd",

}

RIS

TY - JOUR

T1 - Enabling Shared Memory Communication in Networks of MPSoCs

AU - Lant, Joshua

AU - Concatto, Caroline

AU - Attwood, Andrew

AU - Pascual Saiz, Jose

AU - Ashworth, Mike

AU - Navaridas, Javier

AU - Luján, Mikel

AU - Goodacre, Anthony

PY - 2018

Y1 - 2018

N2 - Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi‐Processor System‐on‐Chip), combining multiple hard‐core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting‐edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state‐of‐the‐art MPSoC, and the challenges to be overcome given the device's limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production‐ready.

AB - Ongoing transistor scaling and the growing complexity of embedded system designs has led to the rise of MPSoCs (Multi‐Processor System‐on‐Chip), combining multiple hard‐core CPUs and accelerators (FPGA, GPU) on the same physical die. These devices are of great interest to the supercomputing community, who are increasingly reliant on heterogeneity to achieve power and performance goals in these closing stages of the race to exascale. In this paper, we present a network interface architecture and networking infrastructure, designed to sit inside the FPGA fabric of a cutting‐edge MPSoC device, enabling networks of these devices to communicate within both a distributed and shared memory context, with reduced need for costly software networking system calls. We will present our implementation and prototype system and discuss the main design decisions relevant to the use of the Xilinx Zynq Ultrascale+, a state‐of‐the‐art MPSoC, and the challenges to be overcome given the device's limitations and constraints. We demonstrate the working prototype system connecting two MPSoCs, with communication between processor and remote memory region and accelerator. We then discuss the limitations of the current implementation and highlight areas of improvement to make this solution production‐ready.

U2 - 10.1002/cpe.4774

DO - 10.1002/cpe.4774

M3 - Article

JO - Concurrency and Computation: Practice & Experience

JF - Concurrency and Computation: Practice & Experience

SN - 1532-0626

ER -