Making a Case for an ARM Cortex-A9 CPU Interlay Replacing the NEON SIMD UnitCitation formats
Standard
Making a Case for an ARM Cortex-A9 CPU Interlay Replacing the NEON SIMD Unit. / Garcia Ordaz, Jose Raul; Koch, Dirk.
International Conference on Field-Programmable Logic and Applications. 2017. (2017 27th International Conference on Field Programmable Logic and Applications (FPL)).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Making a Case for an ARM Cortex-A9 CPU Interlay Replacing the NEON SIMD Unit
AU - Garcia Ordaz, Jose Raul
AU - Koch, Dirk
N1 - Conference code: 2017
PY - 2017
Y1 - 2017
N2 - As an alternative of adding more and more in-structions to CPU cores in order to address a wide rangeof applications, this paper examines to use a mixed grainedCPU interlay fabric to provide reconfigurable instruction setextensions. In detail, we are examining to replace the hardenedNEON SIMD unit of an ARM Cortex-A9 with an identical sizedFPGA fabric. We show that by applying a set of optimizations, weare able to emulate original applications using NEON instructionsat the same hardware cost and at very little performance drop byan interlay. Moreover we are demonstrating examples where spe-cial custom instructions running on a CPU-Interlay-hybrid aresubstantially outperforming the original hardened CPU-NEON-system, hence making a strong case to embed reconfigurabilityas a beneficial feature in future processors.
AB - As an alternative of adding more and more in-structions to CPU cores in order to address a wide rangeof applications, this paper examines to use a mixed grainedCPU interlay fabric to provide reconfigurable instruction setextensions. In detail, we are examining to replace the hardenedNEON SIMD unit of an ARM Cortex-A9 with an identical sizedFPGA fabric. We show that by applying a set of optimizations, weare able to emulate original applications using NEON instructionsat the same hardware cost and at very little performance drop byan interlay. Moreover we are demonstrating examples where spe-cial custom instructions running on a CPU-Interlay-hybrid aresubstantially outperforming the original hardened CPU-NEON-system, hence making a strong case to embed reconfigurabilityas a beneficial feature in future processors.
U2 - 10.23919/FPL.2017.8056806
DO - 10.23919/FPL.2017.8056806
M3 - Conference contribution
T3 - 2017 27th International Conference on Field Programmable Logic and Applications (FPL)
BT - International Conference on Field-Programmable Logic and Applications
T2 - Field Programmable Logic and Applications
Y2 - 4 September 2017 through 8 September 2017
ER -