The vast gap between CPU and RAM speed means that on modern architectures, developers need to carefully consider data placement in memory to exploit spatial and temporal cache locality and use CPU caches effectively. To that extent, developers have devised various strategies regarding data placement; for objects that should be close in memory, a contiguous pool of objects is allocated and then new instances are constructed inside it; an array of objects is clustered into multiple arrays, each holding the values of a specific field of the objects. Such data placements, however, have to be performed manually, hence readability, maintainability, memory safety, and key OO concepts such as encapsulation and object identity need to be sacrificed and the business logic needs to be modified accordingly.
We propose a language extension, SHAPES, which aims to offer developers high-level fine-grained control over data placement, whilst retaining memory safety and the look-and-feel of OO. SHAPES extends an OO language with the concepts of pools and layouts: Developers declare pools that contain objects of a specific type and specify the pool’s layout. A layout specifies how objects in a pool are laid out in memory. That is, it dictates how the values of the fields of the pool’s objects are grouped together into clusters. Objects stored in pools behave identically to ordinary, standalone objects; the type system allows the code to be oblivious to the layout being used. This means that the business logic is completely decoupled from any placement concerns and the developer need not deviate from the spirit of OO to better utilise the cache.
In this paper, we present the features of SHAPES, as well as the design rationale behind each feature. We then showcase the merit of SHAPES through a sequence of case studies; we claim that, compared to the manual pooling and clustering of objects, we can observe improvement in readability and maintainability, and comparable (i.e., on par or better) performance.
We also present SHAPES^h, an OO calculus which models the SHAPES ideas, we formalise the type system, and prove soundness. The SHAPES^h type system uses ideas from Ownership Types [Clarke et al., 2013] and Java Generics [Gosling et al., 2014]: In SHAPES^h, pools are part of the types; SHAPES^h class and type definitions are enriched with pool parameters. Moreover, class pool parameters are enriched with bounds, which is what allows the business logic of SHAPES to be oblivious to the layout being used. SHAPES^h types also enforce pool uniformity and homogeneity. A pool is uniform if it contains objects of the same class only; a pool is homogeneous if the corresponding fields of all its objects point to objects in the same pool. These properties allow for more efficient implementation.
For performance considerations, we also designed SHAPES^l, an untyped, unsafe low-level language with no explicit support for objects or pools. We argue that it is possible to translate SHAPES^l into existing low-level intermediate representations, such as LLVM [Lattner and Adve, 2004], present the translation of SHAPES^h into SHAPES^l, and show its soundness.
Thus, we expect SHAPES to offer developers more fine-grained control over data placement, without sacrificing memory safety or the OO look-and-feel.