Nowadays, most programmable systems contain multiple hardware accelerators with different characteristics. In order to use the available hardware resources and improve the performance of their applications, developers must use a low-level language, such as C/C++. Succeeding the same goal from a high-level managed language (Java, Haskell, C#) poses several challenges such as the inability to perform asynchronous data transfers and declare pinned memory. Therefore, managed languages have not established the path of hardware acceleration yet. Recently, frameworks that run on top of managed runtime systems have been developed, enabling acceleration of high-level programming languages on heterogeneous hardware. In this project, one particular aspect of hardware acceleration in the context of managed runtimes is analyzed, namely memory transfers between the host and the device. Two different solutions for improvement are proposed. The first solution enhances TornadoVM, a heterogeneous managed runtime system, to allow for pinned off-heap buffers allocation and batch processing that overlaps computation with data transfers. A performance increase in data transfers of up to 50% is obtained when pinned memory is used. Additionally, up to 2.5x in end to end performance speed up can be achieved over sequential batches, when pinned memory is combined with parallel batching. The second solution extends MaxineVM to allocate its heap through the CUDA Unified Memory, allowing for Java objects resident in the heap to be accessed by the GPU. A performance increase of up to 134x end to end and a garbage collection slowdown of 2.45x compared against sequential Java execution is obtained.