Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time CompilationCitation formats

Standard

Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation. / Papadimitriou, Michail; Fumero Alfonso, Juan; Stratikopoulos, Athanasios; Kotselidis, Christos-Efthymios.

2021. 57-70 Paper presented at The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21).

Research output: Contribution to conferencePaperpeer-review

Harvard

Papadimitriou, M, Fumero Alfonso, J, Stratikopoulos, A & Kotselidis, C-E 2021, 'Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation', Paper presented at The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21), 16/04/21 - 16/04/21 pp. 57-70.

APA

Papadimitriou, M., Fumero Alfonso, J., Stratikopoulos, A., & Kotselidis, C-E. (Accepted/In press). Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation. 57-70. Paper presented at The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21).

Vancouver

Papadimitriou M, Fumero Alfonso J, Stratikopoulos A, Kotselidis C-E. Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation. 2021. Paper presented at The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21).

Author

Papadimitriou, Michail ; Fumero Alfonso, Juan ; Stratikopoulos, Athanasios ; Kotselidis, Christos-Efthymios. / Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation. Paper presented at The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21).14 p.

Bibtex

@conference{8c2ba9c6cef342c59f49a87c97eaecd7,
title = "Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation",
abstract = "Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming. The efficient utilization of different GPU memory tiers can yield higher performance at the expense of programmability since developers must have extended knowledge of the architectural details in order to utilize them.In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs. In particular, we present a set of compiler extensions that allow arbitrary Java programs to utilize local memory on GPUs without explicit programming. We prototype and evaluate our proposed solution in the context of TornadoVM against a set of benchmarks and GPU architectures, showcasing performance speedups of up to 2.5x compared to equivalent baseline implementations that do not utilize local memory or data locality. In addition, we compare our proposed solution against hand-written optimized OpenCL code to assess the upper bound of performance improvements that can be transparently achieved by JIT compilation without trading programmability. The results showcase that the proposed extensions can achieve up to 94% of the performance of the native code, highlighting the efficiency of the generated code.",
keywords = "GPU, JIT Compilation, Tiered-Memory",
author = "Michail Papadimitriou and {Fumero Alfonso}, Juan and Athanasios Stratikopoulos and Christos-Efthymios Kotselidis",
year = "2021",
month = apr,
day = "16",
language = "English",
pages = "57--70",
note = "The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE{\textquoteright}21), VEE ; Conference date: 16-04-2021 Through 16-04-2021",
url = "https://conf.researchr.org/home/vee-2021",

}

RIS

TY - CONF

T1 - Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation

AU - Papadimitriou, Michail

AU - Fumero Alfonso, Juan

AU - Stratikopoulos, Athanasios

AU - Kotselidis, Christos-Efthymios

N1 - Conference code: 17

PY - 2021/4/16

Y1 - 2021/4/16

N2 - Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming. The efficient utilization of different GPU memory tiers can yield higher performance at the expense of programmability since developers must have extended knowledge of the architectural details in order to utilize them.In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs. In particular, we present a set of compiler extensions that allow arbitrary Java programs to utilize local memory on GPUs without explicit programming. We prototype and evaluate our proposed solution in the context of TornadoVM against a set of benchmarks and GPU architectures, showcasing performance speedups of up to 2.5x compared to equivalent baseline implementations that do not utilize local memory or data locality. In addition, we compare our proposed solution against hand-written optimized OpenCL code to assess the upper bound of performance improvements that can be transparently achieved by JIT compilation without trading programmability. The results showcase that the proposed extensions can achieve up to 94% of the performance of the native code, highlighting the efficiency of the generated code.

AB - Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming. The efficient utilization of different GPU memory tiers can yield higher performance at the expense of programmability since developers must have extended knowledge of the architectural details in order to utilize them.In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs. In particular, we present a set of compiler extensions that allow arbitrary Java programs to utilize local memory on GPUs without explicit programming. We prototype and evaluate our proposed solution in the context of TornadoVM against a set of benchmarks and GPU architectures, showcasing performance speedups of up to 2.5x compared to equivalent baseline implementations that do not utilize local memory or data locality. In addition, we compare our proposed solution against hand-written optimized OpenCL code to assess the upper bound of performance improvements that can be transparently achieved by JIT compilation without trading programmability. The results showcase that the proposed extensions can achieve up to 94% of the performance of the native code, highlighting the efficiency of the generated code.

KW - GPU

KW - JIT Compilation

KW - Tiered-Memory

M3 - Paper

SP - 57

EP - 70

T2 - The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21)

Y2 - 16 April 2021 through 16 April 2021

ER -