The demand on memory capacity from applications has always challenged the available technologies. It is therefore important to understand that this demand and the consequential limitations in various aspects led to the appearance of new memory technologies and system designs. Fundamentally, not a single solution has managed to fully solve this memory capacity challenge. As argued in this survey paper, limitations by physical laws make the effort of expanding local off-chip memory impossible without adopting new approaches. The concept of Non Unified Memory Access (NUMA) architecture provides more system memory by using pools of processors, each with their own memories, to workaround the physical constraints on a single processor, but the additional system complexities and costs led to various scalability issues that deter any further system expansion using this method.
Computer clusters were the first configurations to eventually provide a Distributed Shared Memory (DSM) system at a linear cost while also being more scalable than the traditional cache coherent NUMA systems, however this was achieved by using additional software mechanisms that introduce significant latency when accessing the increased memory capacity. As we describe, since the initial software DSM systems, a lot of effort has been invested to create simpler and higher performance solutions including: software libraries, language extensions, high performance interconnects and abstractions via system hypervisors, where each approach allows a more efficient way of memory resource allocation and usage between different nodes in a machine cluster.
Despite such efforts, the fundamental problems of maintaining cache coherence across a scaled system with thousands of nodes is not something that any of the current approaches are capable of efficiently providing, and therefore the requirement of delivering a scalable memory capacity still poses a real challenge for system architects. New design concepts and technologies, such as 3D stacked RAM and the Unimem architecture, are promising and can offer a substantial increase in performance and memory capacity, but together there is no generally accepted solution to provide efficient Distributed Shared Memory.