Transparent acceleration of Java-based deep learning engines

Research output: Contribution to conferencePaperpeer-review

  • External authors:
  • Athanasios Stratikopoulos
  • Mihai-Cristian Olteanu
  • Ian Vaughan
  • Zoran Sevarac
  • Nikolaos Foutris


The advent of modern cloud services, along with the huge volumeof data produced on a daily basis, have increased the demand forfast and efficient data processing. This demand is common amongnumerous application domains, such as deep learning, data mining,and computer vision. In recent years, hardware accelerators havebeen employed as a means to meet this demand, due to the highparallelism that these applications exhibit. Although this approachcan yield high performance, the development of new deep learning neural networks on heterogeneous hardware requires a steeplearning curve. The main reason is that existing deep learning engines support the static compilation of the accelerated code, thatcan be accessed via wrapper calls from a wide range of managedprogramming languages (e.g., Java, Python, Scala). Therefore, thedevelopment of high-performance neural network architecturesis fragmented between programming models, thereby forcing developers to manually specialize the code for heterogeneous execution. The specialization of the applications’ code for heterogeneousexecution is not a trivial task, as it requires developers to havehardware expertise and use a low-level programming language,such as OpenCL, CUDA or High Level Synthesis (HLS) tools.In this paper we showcase how we have employed TornadoVM,a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts on heterogeneous hardware. Ourwork shows how a pure Java-based deep learning neural networkengine can be dynamically compiled at runtime and specialized forparticular hardware accelerators, without requiring developers toemploy any low-level programming framework typically used forsuch devices. Our preliminary results show up to 6.45x end-to-endperformance speedup and up to 88.5x kernel performance speedup,when executing the feed forward process of the network’s trainingon the GPUs against the sequential execution of the original DeepNetts framework.

Bibliographical metadata

Original languageEnglish
Publication statusAccepted/In press - 23 Sep 2020
EventInternational Conference on Managed Programming Languages & Runtimes - Manchester, United Kingdom
Event duration: 4 Nov 20206 Nov 2020


ConferenceInternational Conference on Managed Programming Languages & Runtimes
Abbreviated title MPLR ’20
CountryUnited Kingdom
Internet address