Accurate and Complete Hardware Profiling for OpenMPCitation formats

Standard

Accurate and Complete Hardware Profiling for OpenMP. / Neill, Richard; Drebes, Andi; Pop, Antoniu.

Proceedings of the 13th International Workshop on OpenMP: Scaling OpenMP for Exascale Performance and Portability. Springer Nature, 2017. (Lecture Notes in Computer Science (LNCS)).

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Harvard

Neill, R, Drebes, A & Pop, A 2017, Accurate and Complete Hardware Profiling for OpenMP. in Proceedings of the 13th International Workshop on OpenMP: Scaling OpenMP for Exascale Performance and Portability. Lecture Notes in Computer Science (LNCS), Springer Nature, 13th International Workshop on OpenMP, Stony Brook, United States, 18/09/17.

APA

Neill, R., Drebes, A., & Pop, A. (Accepted/In press). Accurate and Complete Hardware Profiling for OpenMP. In Proceedings of the 13th International Workshop on OpenMP: Scaling OpenMP for Exascale Performance and Portability (Lecture Notes in Computer Science (LNCS)). Springer Nature.

Vancouver

Neill R, Drebes A, Pop A. Accurate and Complete Hardware Profiling for OpenMP. In Proceedings of the 13th International Workshop on OpenMP: Scaling OpenMP for Exascale Performance and Portability. Springer Nature. 2017. (Lecture Notes in Computer Science (LNCS)).

Author

Neill, Richard ; Drebes, Andi ; Pop, Antoniu. / Accurate and Complete Hardware Profiling for OpenMP. Proceedings of the 13th International Workshop on OpenMP: Scaling OpenMP for Exascale Performance and Portability. Springer Nature, 2017. (Lecture Notes in Computer Science (LNCS)).

Bibtex

@inproceedings{f3f373b0544f42ca8d92c939ce1d692e,
title = "Accurate and Complete Hardware Profiling for OpenMP",
abstract = "Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.",
keywords = "Performance analysis, Hardware events, Performance monitoring counters, OpenMP profiling",
author = "Richard Neill and Andi Drebes and Antoniu Pop",
year = "2017",
month = may,
day = "26",
language = "English",
series = "Lecture Notes in Computer Science (LNCS)",
publisher = "Springer Nature",
booktitle = "Proceedings of the 13th International Workshop on OpenMP",
address = "United States",
note = "13th International Workshop on OpenMP, IWOMP ; Conference date: 18-09-2017 Through 22-09-2017",
url = "https://you.stonybrook.edu/iwomp2017/",

}

RIS

TY - GEN

T1 - Accurate and Complete Hardware Profiling for OpenMP

AU - Neill, Richard

AU - Drebes, Andi

AU - Pop, Antoniu

N1 - Conference code: 13

PY - 2017/5/26

Y1 - 2017/5/26

N2 - Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.

AB - Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.

KW - Performance analysis

KW - Hardware events

KW - Performance monitoring counters

KW - OpenMP profiling

M3 - Conference contribution

T3 - Lecture Notes in Computer Science (LNCS)

BT - Proceedings of the 13th International Workshop on OpenMP

PB - Springer Nature

T2 - 13th International Workshop on OpenMP

Y2 - 18 September 2017 through 22 September 2017

ER -