Remote Memory References at Block Granularity

Attiya, Hagit; Yavneh, Gili

doi:10.4230/LIPIcs.OPODIS.2017.18

File

LIPIcs.OPODIS.2017.18.pdf

Filesize: 0.51 MB
17 pages

Document Identifiers

DOI: 10.4230/LIPIcs.OPODIS.2017.18
URN: urn:nbn:de:0030-drops-86538

Author Details

Hagit Attiya

Gili Yavneh

Cite AsGet BibTex

Hagit Attiya and Gili Yavneh. Remote Memory References at Block Granularity. In 21st International Conference on Principles of Distributed Systems (OPODIS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 95, pp. 18:1-18:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.OPODIS.2017.18

Abstract

The cost of accessing shared objects that are stored in remote memory, while neglecting accesses to shared objects that are cached in the local memory, can be evaluated by the number of remote memory references (RMRs) in an execution. Two flavours of this measure—cache-coherent (CC) and distributed shared memory (DSM)—model two popular shared-memory architectures. The number of RMRs, however, does not take into account the granularity of memory accesses, namely, the fact that accesses to the shared memory are performed in blocks. This paper proposes a new measure, called block RMRs, counting the number of remote memory references while taking into account the fact that shared objects can be grouped into blocks. On the one hand, this measure reflects the fact that the RMR incurred for bringing a shared object to the local memory might save another RMR for bringing another object placed at the same block. On the other hand, this measure accounts for false sharing: the fact that an RMR may be incurred when accessing an object due to a concurrent access to another object in the same block. This paper proves that in both the CC and the DSM models, finding an optimal placement is NP-hard when objects have different sizes, even for two processes. In the CC model, finding an optimal placement, i.e., grouping of objects into blocks, is NP-hard when a block can store three objects or more; the result holds even if the sequence of accesses is known in advance. In the DSM model, the answer depends on whether there is an efficient mechanism to inform processes that the data in their local memory is no longer valid, i.e., cache coherence is supported. If coherence is supported with cheap invalidation, then finding an optimal solution is NP-hard. If coherence is not supported, an optimal placement can be achieved by placing each object in the memory of the process that accesses it most often, if the sequence of accesses is known in advance.

Keywords

false sharing
cache coherence
distributed shared memory
NP-hardness

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Yehuda Afek, Dave Dice, and Adam Morrison. Cache index-aware memory allocation. SIGPLAN Not., 46(11):55-64, 2011. URL: http://dx.doi.org/10.1145/2076022.1993486.
Dan Alistarh, James Aspnes, Seth Gilbert, and Rachid Guerraoui. The complexity of renaming. In IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 718-727, 2011.
James H. Anderson, Yong-Jik Kim, and Ted Herman. Shared-memory mutual exclusion: Major research trends since 1986. Distrib. Comput., 16(2-3):75-110, 2003. URL: http://dx.doi.org/10.1007/s00446-003-0088-6.
Lars Arge, Michael A. Bender, Erik D. Demaine, Bryan Holland-Minkley, and J. Ian Munro. Cache-oblivious priority queue and graph algorithm applications. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC, pages 268-276, 2002. URL: http://dx.doi.org/10.1145/509907.509950.
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, pages 355-366, 2011. URL: http://dx.doi.org/10.1145/1989493.1989553.
William J. Bolosky and Michael L. Scott. False sharing and its effect on shared memory performance. In USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4, 1993. URL: http://dl.acm.org/citation.cfm?id=1295480.1295483.
Gerth Stølting Brodal, Rolf Fagerberg, and Gabriel Moruz. Cache-aware and cache-oblivious adaptive sorting. In Proceedings of the 32nd International Conference on Automata, Languages and Programming, ICALP, pages 576-588, 2005. URL: http://dx.doi.org/10.1007/11523468_47.
Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. Cache-conscious data placement. SIGPLAN Not., 33(11):139-149, 1998. URL: http://dx.doi.org/10.1145/291006.291036.
Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious structure layout. In Proceedings of the ACM Conference on Programming Language Design and Implementation, PLDI, pages 1-12, 1999.
Rezaul Alam Chowdhury, Vijaya Ramachandran, Francesco Silvestri, and Brandon Blakeley. Oblivious algorithms for multicores and networks of processors. Journal of Parallel and Distributed Computing, 73(7):911-925, 2013.
Susan J. Eggers and Tor E. Jeremiassen. Eliminating false sharing. In Proceedings of the International Conference on Parallel Processing, ICPP. Volume I: Architecture/Hardware, pages 377-381, 1991.
Rolf Fagerberg, Anna Pagh, and Rasmus Pagh. External string sorting: Faster and cache-oblivious. In Proceedings of the 23rd Annual Conference on Theoretical Aspects of Computer Science, STACS, pages 68-79, 2006. URL: http://dx.doi.org/10.1007/11672142_4.
Arash Farzan, Paolo Ferragina, Gianni Franceschini, and J. Ian Munro. Cache-oblivious comparison-based algorithms on multisets. In Proceedings of the 13th Annual European Conference on Algorithms, ESA, pages 305-316, 2005. URL: http://dx.doi.org/10.1007/11561071_29.
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS), pages 285-297, 1999.
Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman &Co., New York, NY, USA, 1979.
Wojciech Golab, Danny Hendler, and Philipp Woelfel. An O(1) RMRs leader election algorithm. SIAM J. Comput., 39(7):2726-2760, 2010.
Laurent Hyafil and Ronald L. Rivest. Graph partitioning and constructing optimal decision trees are polynomial complete problems. Technical Report Rapport de Recherche no. 33, IRIA - Laboratoire de Recherche en Informatique et Automatique, October 1973.
Rahman Lavaee. The hardness of data packing. In Proceedings of the 43rd Annual ACM Symposium on Principles of Programming Languages, POPL, pages 232-242, 2016. URL: http://dx.doi.org/10.1145/2837614.2837669.
Bill Nitzberg and Virginia Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52-60, aug 1991. URL: http://dx.doi.org/10.1109/2.84877.
Erez Petrank and Dror Rawitz. The hardness of cache conscious data placement. SIGPLAN Not., 37(1):101-112, 2002. URL: http://dx.doi.org/10.1145/565816.503283.
M.D. Plummer and L. Lovász. Matching Theory. North-Holland Mathematics Studies. Elsevier Science, 1986. URL: https://books.google.co.il/books?id=mycZP-J344wC.
Harald Prokop. Cache-oblivious algorithms. Master’s thesis, Massacusetts Institute if Technology, Cambridge, MA, 7 1999.
Josep Torrellas, HS Lam, and John L. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651-663, 1994.
P. van Emde Boas. Preserving order in a forest in less than logarithmic time. In Proceedings of the 16th Symposium on Foundations of Computer Science, pages 75-84, 1975.
Jae-Heon Yang and James H. Anderson. A fast, scalable mutual exclusion algorithm. Distributed Computing, 9(1):51-60, 1995. URL: http://dx.doi.org/10.1007/BF01784242.