Vectorization-Aware Loop Unrolling with Seed ForwardingCitation formats

  • External authors:
  • Rodrigo C. O. Rocha
  • Vasileios Porpodas
  • Luís F. W. Góes
  • Zheng Wang
  • Murray Cole
  • Hugh Leather

Standard

Vectorization-Aware Loop Unrolling with Seed Forwarding. / Rocha, Rodrigo C. O.; Porpodas, Vasileios; Petoumenos, Pavlos; Góes, Luís F. W.; Wang, Zheng; Cole, Murray; Leather, Hugh.

Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction. 2019.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Harvard

Rocha, RCO, Porpodas, V, Petoumenos, P, Góes, LFW, Wang, Z, Cole, M & Leather, H 2019, Vectorization-Aware Loop Unrolling with Seed Forwarding. in Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction.

APA

Rocha, R. C. O., Porpodas, V., Petoumenos, P., Góes, L. F. W., Wang, Z., Cole, M., & Leather, H. (Accepted/In press). Vectorization-Aware Loop Unrolling with Seed Forwarding. In Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction

Vancouver

Rocha RCO, Porpodas V, Petoumenos P, Góes LFW, Wang Z, Cole M et al. Vectorization-Aware Loop Unrolling with Seed Forwarding. In Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction. 2019

Author

Rocha, Rodrigo C. O. ; Porpodas, Vasileios ; Petoumenos, Pavlos ; Góes, Luís F. W. ; Wang, Zheng ; Cole, Murray ; Leather, Hugh. / Vectorization-Aware Loop Unrolling with Seed Forwarding. Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction. 2019.

Bibtex

@inproceedings{a36820cafa1b4b3bbcb40b9b7e3a926e,
title = "Vectorization-Aware Loop Unrolling with Seed Forwarding",
abstract = "Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates repeated isomorphic instruction sequences, it enables SLP to vectorize more code. However, most production compilers apply these optimizations independently and uncoordinated. Unrolling is commonly tuned to avoid code bloat, not maximizing the potential for vectorization, leading to missed vectorization opportunities.We are proposing VALU, a novel loop unrolling heuristic that takes vectorization into account when making unrolling decisions. Our heuristic is powered by an analysis that estimates the potential benefit of SLP vectorization for the unrolled version of the loop. Our heuristic then selects the unrolling factor that maximizes the utilization of the vector units. VALU also forwards the vectorizable code to SLP, allowing it to bypass its greedy search for vectorizable seed instructions, exposing more vectorization opportunities.Our evaluation on a production compiler shows that VALU uncovers many vectorization opportunities that were missed by the default loop unroller and vectorizers. This results in more vectorized code and significant performance speedups for 17 of the kernels of the TSVC benchmarks suite, reaching up to 2× speedup over the already highly optimized -O3. Our evaluation on full benchmarks from FreeBench and MiBench shows that VALU results in a geo-mean speedup of 1.06.",
author = "Rocha, {Rodrigo C. O.} and Vasileios Porpodas and Pavlos Petoumenos and G{\'o}es, {Lu{\'i}s F. W.} and Zheng Wang and Murray Cole and Hugh Leather",
year = "2019",
month = dec,
day = "23",
language = "English",
booktitle = "Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction",

}

RIS

TY - GEN

T1 - Vectorization-Aware Loop Unrolling with Seed Forwarding

AU - Rocha, Rodrigo C. O.

AU - Porpodas, Vasileios

AU - Petoumenos, Pavlos

AU - Góes, Luís F. W.

AU - Wang, Zheng

AU - Cole, Murray

AU - Leather, Hugh

PY - 2019/12/23

Y1 - 2019/12/23

N2 - Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates repeated isomorphic instruction sequences, it enables SLP to vectorize more code. However, most production compilers apply these optimizations independently and uncoordinated. Unrolling is commonly tuned to avoid code bloat, not maximizing the potential for vectorization, leading to missed vectorization opportunities.We are proposing VALU, a novel loop unrolling heuristic that takes vectorization into account when making unrolling decisions. Our heuristic is powered by an analysis that estimates the potential benefit of SLP vectorization for the unrolled version of the loop. Our heuristic then selects the unrolling factor that maximizes the utilization of the vector units. VALU also forwards the vectorizable code to SLP, allowing it to bypass its greedy search for vectorizable seed instructions, exposing more vectorization opportunities.Our evaluation on a production compiler shows that VALU uncovers many vectorization opportunities that were missed by the default loop unroller and vectorizers. This results in more vectorized code and significant performance speedups for 17 of the kernels of the TSVC benchmarks suite, reaching up to 2× speedup over the already highly optimized -O3. Our evaluation on full benchmarks from FreeBench and MiBench shows that VALU results in a geo-mean speedup of 1.06.

AB - Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates repeated isomorphic instruction sequences, it enables SLP to vectorize more code. However, most production compilers apply these optimizations independently and uncoordinated. Unrolling is commonly tuned to avoid code bloat, not maximizing the potential for vectorization, leading to missed vectorization opportunities.We are proposing VALU, a novel loop unrolling heuristic that takes vectorization into account when making unrolling decisions. Our heuristic is powered by an analysis that estimates the potential benefit of SLP vectorization for the unrolled version of the loop. Our heuristic then selects the unrolling factor that maximizes the utilization of the vector units. VALU also forwards the vectorizable code to SLP, allowing it to bypass its greedy search for vectorizable seed instructions, exposing more vectorization opportunities.Our evaluation on a production compiler shows that VALU uncovers many vectorization opportunities that were missed by the default loop unroller and vectorizers. This results in more vectorized code and significant performance speedups for 17 of the kernels of the TSVC benchmarks suite, reaching up to 2× speedup over the already highly optimized -O3. Our evaluation on full benchmarks from FreeBench and MiBench shows that VALU results in a geo-mean speedup of 1.06.

M3 - Conference contribution

BT - Proceedings of the ACM SIGPLAN 2020 International Conference on Compiler Construction

ER -