Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions

Research output: Working paper


Collecting hardware event counts is essential to understand program execution behavior, to analyze interactions with the hardware, and to devise effective optimizations. However, the Performance Monitoring Counters (PMCs) available on contemporary systems can only monitor a small fraction of hardware events simultaneously, either limiting analyses and insight to a very narrow window, or incurring high accuracy and overhead penalties when time multiplexing PMCs. We show that by multiplexing PMCs across multiple executions of the same program, and carefully reconciling and merging the multiple profiles, it is possible to acquire counts for all available hardware events while achieving high accuracy. Our multi-execution-profile combination scheme can use different policies to merge information from different executions in a single coherent profile, and we show that our new behavior clustering technique achieves the highest accuracy. We further present a new metric for evaluating accuracy, based on the Earth Mover’s Distance, to determine the (dis)similarity between execution profiles obtained through different techniques and the baseline monitoring of events without multiplexing. We provide a quantitative analysis of the accuracy of hardware event profiles generated using the Linux perf_event hardware event multiplexing implementation, as well as those generated by our combination technique. We show that hardware event multiplexing produces highly inaccurate execution profiles, even when the technique is only used to monitor a few additional hardware events than can be simultaneously monitored. We find that our combination technique produces significantly more accurate execution profiles that contain all available hardware events, allowing for greater reliability and completeness in subsequent performance analyses.

Bibliographical metadata

Original languageEnglish
StatePublished - 2017