FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workloads on HPC systems. The adoption of FPGAs for scientific applications has been stimulated recently by the emergence of better programming environments such as High-Level Synthesis (HLS) and OpenCL available through the Xilinx SDSoC design tool. The mapping of the multi-level concurrency available within applications onto HPC systems with FPGAs is a challenge. OpenCL and HLS provide different mechanisms for exploiting concurrency within a node leading to a concurrency mapping design problem. In addition to considering the performance of different mappings, there are also questions of resource usage, programmability (development effort), ease-of-use and robustness. This paper examines the concurrency levels available in a case study kernel from a shallow water model and explores the programming options available in OpenCL and HLS. We conclude that the use of SDSoC Data flow over functions mechanism, targeting functional parallelism in the kernel, provides the best performance in terms of both Latency and execution time, with a speedup of 314x over the naïve reference implementation.