The constant growth of data is pushing storage systems towards ever-increasing I/O bandwidth and lower latency requirements. In recent years, the Non-Volatile Memory Express (NVMe) standard has enabled SSD drives to deliver high I/O rates by allowing storage to be connected directly to the processing chip via the fastest available interconnect (i.e. PCIe). Although SSDs have become ubiquitous in data centres, reducing the latency gap with main memory is still a first-order challenge. Additionally, the adoption of FPGAs in data centres is creating opportunities to accelerate various applications and/or OS operations. While FPGAs in data centres have been connected via PCIe to mostly x86 servers, there are now heterogeneous System on Chips (SoCs) with multi-cores and FPGAs integrated on the same die and connected by an on-chip interconnect. This thesis analyses the source of performance overhead on existing state-of-the-art storage devices and proposes a novel low overhead and energy efficient storage path called FastPath, that operates transparently to the main processor. Experimental results show that FastPath can achieve up to 82% lower latency, up to 12x higher throughput, and up to 10x more energy efficiency for a standard microbenchmark on an Xilinx Zynq 7000 SoC. Further experiments have been conducted on a state-of-the-art SoC (e.g. Xilinx Zynq UltraScale+ MPSoC), using a real application, such as the Redis in-memory database. The Redis database is configured to deliver requests issued by the Yahoo! Cloud Serving Benchmark (YCSB) into the storage device via FastPath. The experimental evaluation shows that FastPath achieves up to 60% lower tail latency and 15% higher throughput than the baseline storage path in the Linux kernel.