In recent years there has been renewed interest in the use of Field Programmable Gate Arrays (FPGA) for High Performance Computing (HPC). In this paper we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High Level Synthesis to implement a matrix-vector kernel from the LFRic code on a Xilinx UltraScale+ development board containing an XCZU9EG Multi-Processor System-on-a-Chip. We describe the porting of the code, discuss the optimization decisions and report performance of 5.34 Gflop/s with double precision and 5.58 Gflop/s with single precision.
We discuss sources of inefficiencies, comparisons with peak performance, comparisons with CPU and GPU performance (taking into account power and price), comparisons with published techniques, and with published performance, and we conclude with some comments on the prospects for future progress with FPGA acceleration of the weather forecast model.
The realization of practical Exascale-class high-performance computing systems requires significant improvements in the energy efficiency of such systems and their components. This has generated interest in computer architectures which utilize accelerators alongside traditional CPUs. FPGAs offer huge potential as an accelerator which can deliver performance for scientific applications at high levels of energy efficiency. The EuroExa project is developing and building a high-performance architecture based upon ARM CPUs with FPGA acceleration targeting Exascale-class performance within a realistic power budget.