Progress in microprocessors has moved towards increasing the number of cores, heterogeneity, and bitness of computing. Performance, programmability and energy efficiency of next generations of microprocessors will be highly dependent on efficient synchronisation, hardware virtualisation, memory subsystem utilisation and closer synergy between hardware and software. This multidisciplinary work addresses these challenges by advancing the state-of-the-art in the following three major fields of computer science: shared-memory synchronisation, computer architecture simulation, and high-level language computer architecture. Firstly, this thesis presents a study of the state-of-the-art barrier synchronisation algorithms specialised for the Intel Xeon Phi architecture. The novel proposed hybrid barrier synchronisation algorithm exploits the topology, the memory hierarchy, and other capabilities of the Intel Xeon Phi 5110P coprocessor. The showcased algorithm achieves a 3.28x lower overhead than the Intel OpenMP barrier implementation (ICC 14.0.0), thus outperforming all other known implementations. The study investigates design issues of Intel Xeon Phi with respect to barrier synchronisation. Furthermore, the thesis introduces an extensible parameterised framework for empirical evaluation of barrier synchronisation algorithms on different systems, and it is released as free software. Secondly, a novel open-source simulation platform named MaxSim is introduced. MaxSim facilitates hardware/software co-design of managed runtime environments and architectures with tagged pointers support. It has an awareness of the managed runtime environment, supports fast tagged pointers simulation on the x86-64 architecture, allows to model new hardware extensions, to perform microarchitectural profiling and to model complex software changes via a novel address-space morphing technique. MaxSim is available as free software. Finally, the work explores hardware/software co-design opportunities of managed runtime environments and architectures with tagged pointers support. It is shown how an array length can be stored in a tagged pointer and efficiently retrieved from it with the assistance of hardware extenstions in Java Virtual Machine implementations. The proposed technique resulted in up to 4% and 2% geometric mean dynamic energy reduction and up to 14% and 7% geometric mean L1 data cache loads reduction. The work also researches how tagged pointers can be used for storing type information in Java Virtual Machine implementations. In addition, novel hardware extensions to the address generation and load-store units are proposed to achieve low-overhead type information retrieval and tagged object pointers compression-decompression. The evaluation shows up to 26% and 10% geometric mean heap space savings, up to 50% and 12% geometric mean dynamic random-access memory dynamic energy reduction, and up to 49% and 3% geometric mean execution time reduction.