NUMASK: High Performance Scalable Skip List for NUMA

Daly, Henry; Hassan, Ahmed; Spear, Michael F.; Palmieri, Roberto

doi:10.4230/LIPIcs.DISC.2018.18

Abstract

This paper presents NUMASK, a skip list data structure specifically designed to exploit the characteristics of Non-Uniform Memory Access (NUMA) architectures to improve performance. NUMASK deploys an architecture around a concurrent skip list so that all metadata accesses (e.g., traversals of the skip list index levels) read and write memory blocks allocated in the NUMA zone where the thread is executing. To the best of our knowledge, NUMASK is the first NUMA-aware skip list design that goes beyond merely limiting the performance penalties introduced by NUMA, and leverages the NUMA architecture to outperform state-of-the-art concurrent high-performance implementations. We tested NUMASK on a four-socket server. Its performance scales for both read-intensive and write-intensive workloads (tested up to 160 threads). In write-intensive workload, NUMASK shows speedups over competitors in the range of 2x to 16x.

numa(3) Linux Programmer’s Manual, second edition, December 2007. URL: https://linux.die.net/man/3/numa.
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Larry Rudolph and Anoop Gupta, editors, ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, USA, November 12-15, 2000., pages 117-128. ACM Press, 2000. Source code available at https://github.com/emeryberger/Hoard. URL: http://dx.doi.org/10.1145/356989.357000.
Sergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova, and Ali Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 557-558, New York, NY, USA, 2010. ACM. URL: http://dx.doi.org/10.1145/1854273.1854350.
Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In Christian Scheideler and Seth Gilbert, editors, Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016, pages 121-132. ACM, 2016. URL: http://dx.doi.org/10.1145/2935764.2935796.
Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. NUMA-aware Reader-writer Locks. In PPoPP '13, 2013.
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. Black-box concurrent data structures for NUMA architectures. In Yunji Chen, Olivier Temam, and John Carter, editors, Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi'an, China, April 8-12, 2017, pages 207-221. ACM, 2017. URL: http://dx.doi.org/10.1145/3037697.3037721.
Tyler Crain, Vincent Gramoli, and Michel Raynal. A speculation-friendly binary search tree. In J. Ramanujam and P. Sadayappan, editors, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012, pages 161-170. ACM, 2012. URL: http://dx.doi.org/10.1145/2145816.2145837.
Tyler Crain, Vincent Gramoli, and Michel Raynal. No hot spot non-blocking skip list. In IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013, 8-11 July, 2013, Philadelphia, Pennsylvania, USA, pages 196-205. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/ICDCS.2013.42.
Mohammad Dashti, Alexandra Fedorova, Justin R. Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quéma, and Mark Roth. Traffic management: a holistic approach to memory placement on NUMA systems. In Vivek Sarkar and Rastislav Bodík, editors, Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, Houston, TX, USA - March 16 - 20, 2013, pages 381-394. ACM, 2013. URL: http://dx.doi.org/10.1145/2451116.2451157.
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Özcan Özturk, Kemal Ebcioglu, and Sandhya Dwarkadas, editors, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, Istanbul, Turkey, March 14-18, 2015, pages 631-644. ACM, 2015. URL: http://dx.doi.org/10.1145/2694344.2694359.
David Dice, Virendra J. Marathe, and Nir Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In PPoPP '12, 2012.
Ian Dick, Alan Fekete, and Vincent Gramoli. A skip list for multicore. Concurrency and Computation: Practice and Experience, 29(4), 2017. URL: http://dx.doi.org/10.1002/cpe.3876.
Jason Evans. jemalloc memory allocator. URL: https://github.com/jemalloc/jemalloc.
Mikhail Fomitchev and Eric Ruppert. Lock-free linked lists and skip lists. In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, PODC '04, pages 50-59, New York, NY, USA, 2004. ACM. URL: http://dx.doi.org/10.1145/1011767.1011776.
Keir Fraser. Practical lock-freedom. PhD thesis, University of Cambridge, September 2003.
Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. Challenges of memory management on modern numa systems. Commun. ACM, 58(12):59-66, 2015. URL: http://dx.doi.org/10.1145/2814328.
Vincent Gramoli. More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Albert Cohen and David Grove, editors, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015, pages 1-10. ACM, 2015. URL: http://dx.doi.org/10.1145/2688500.2688501.
Ahmed Hassan, Roberto Palmieri, and Binoy Ravindran. Transactional interference-less balanced tree. In Distributed Computing - 29th International Symposium, DISC 2015, Tokyo, Japan, October 7-9, 2015, Proceedings, pages 325-340, 2015.
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pages 355-364, New York, NY, USA, 2010. ACM. URL: http://dx.doi.org/10.1145/1810479.1810540.
M. Herlihy and N. Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008.
Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. A simple optimistic skiplist algorithm. In Structural Information and Communication Complexity, 14th International Colloquium, SIROCCO 2007, Castiglioncello, Italy, June 5-8, 2007, Proceedings, pages 124-138, 2007.
Christoph Lameter. Numa (non-uniform memory access): An overview. Queue, 11(7):40:40-40:51, 2013. URL: http://dx.doi.org/10.1145/2508834.2513149.
Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. Thread and memory placement on NUMA systems: Asymmetry matters. In Shan Lu and Erik Riedel, editors, 2015 USENIX Annual Technical Conference, USENIX ATC '15, July 8-10, Santa Clara, CA, USA, pages 277-289. USENIX Association, 2015. URL: https://www.usenix.org/conference/atc15/technical-session/presentation/lepers.
Zoltan Majo and Thomas R. Gross. Memory management in numa multicore systems: Trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 11-20, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1993478.1993481.
Mohamed Mohamedin, Roberto Palmieri, Sebastiano Peluso, and Binoy Ravindran. On designing numa-aware concurrency control for scalable transactional memory. In Rafael Asenjo and Tim Harris, editors, Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, pages 45:1-45:2. ACM, 2016. URL: http://dx.doi.org/10.1145/2851141.2851189.
Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. Scale-out NUMA. In Rajeev Balasubramonian, Al Davis, and Sarita V. Adve, editors, Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014, pages 3-18. ACM, 2014. URL: http://dx.doi.org/10.1145/2541940.2541965.
Iraklis Psaroudakis, Stefan Kaestle, Matthias Grimmer, Daniel Goodman, Jean-Pierre Lozi, and Timothy L. Harris. Analytics with smart arrays: adaptive and efficient language-independent data. In Rui Oliveira, Pascal Felber, and Y. Charlie Hu, editors, Proceedings of the Thirteenth EuroSys Conference, EuroSys 2018, Porto, Portugal, April 23-26, 2018, pages 17:1-17:15. ACM, 2018. URL: http://dx.doi.org/10.1145/3190508.3190514.
William Pugh. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM, 33(6):668-676, 1990. URL: http://dx.doi.org/10.1145/78973.78977.
Nikita Shamgunov. The memsql in-memory database system. In Justin J. Levandoski and Andrew Pavlo, editors, Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM 2014, Hangzhou, China, September 1, 2014., 2014.
Dmitry Vyukov. Unbounded SPSC Queue, 2018. URL: http://www.1024cores.net/home/lock-free-algorithms/queues/unbounded-spsc-queue.

NUMASK: High Performance Scalable Skip List for NUMA

Authors Henry Daly, Ahmed Hassan, Michael F. Spear, Roberto Palmieri

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

NUMASK: High Performance Scalable Skip List for NUMA

Authors Henry Daly, Ahmed Hassan, Michael F. Spear, Roberto Palmieri

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message