There is a huge demand for on-device execution of deep learning algorithms on mobile and embedded platforms. These devices present constraints on the application due to limited hardware resources and power. However, current evaluation studies in existing deep learning frameworks (for example, Caffe, Tensorflow, Torch and others) are limited to performance measurements of these applications on high-end CPUs and GPUs. In this work, we propose
"SyNERGY" a fine-grained energy measurement (that is, at specific layers) and prediction framework for deep neural networks on embedded platforms. We integrate ARM’s Streamline Performance Analyser with standard deep learning
frameworks such as Caffe and CuDNNv5 to quantify the energy-use of deep convolutional neural networks on the Nvidia Jetson Tegra X1. Our measurement framework provides an accurate breakdown of actual energy consumption
and performance across all layers in the neural network while our prediction framework models the energy-use in terms of target-specific performance counters such as SIMD and bus accesses and application specific parameters such as Multiply and Accumulate (MAC) counts. Our experimental results using 9 representative Deep Convolutional Neural Network shows that a multi-variable linear regression model based on hardware performance counters alone achieves an average prediction test error of 8.0 5.96% compared to actual energy measurements. Surprisingly, we find that it is possible to refine the model to predict the number of SIMD instructions and main memory accesses solely from
the application’s Multiply-Accumulate (MAC) counts with an average prediction test error of 0.81 0.77% and 17.09 13% respectively. This alleviates the need for actual measurements giving a final average prediction test error of 7.0 6.0% using solely the application’s MAC counts as input.