Performance Anomalies in Concurrent Data Structure Microbenchmarks

Kharal, Rosina F.; Brown, Trevor

doi:10.4230/LIPIcs.OPODIS.2022.7

Abstract

Recent decades have witnessed a surge in the development of concurrent data structures with an increasing interest in data structures implementing concurrent sets (CSets). Microbenchmarking tools are frequently utilized to evaluate and compare the performance differences across concurrent data structures. The underlying structure and design of the microbenchmarks themselves can play a hidden but influential role in performance results. However, the impact of microbenchmark design has not been well investigated. In this work, we illustrate instances where concurrent data structure performance results reported by a microbenchmark can vary 10-100x depending on the microbenchmark implementation details. We investigate factors leading to performance variance across three popular microbenchmarks and outline cases in which flawed microbenchmark design can lead to an inversion of performance results between two concurrent data structure implementations. We further derive a set of recommendations for best practices in the design and usage of concurrent data structure microbenchmarks and explore advanced features in the Setbench microbenchmark.

Austin Appleby. Murmurhash3, 2012. URL: https://github.com/aappleby/smhasher/wiki/MurmurHash3.
Maya Arbel-Raviv and Trevor Brown. Harnessing epoch-based reclamation for efficient range queries. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '18, pages 14-27, New York, NY, USA, 2018. Association for Computing Machinery. URL: https://doi.org/10.1145/3178487.3178489.
Maya Arbel-Raviv, Trevor Brown, and Adam Morrison. Getting to the root of concurrent binary search tree performance. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 295-306, 2018.
Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson. Bztree: A high-performance latch-free range index for non-volatile memory. Proceedings of the VLDB Endowment, 11(5):553-565, 2018.
Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. Kiwi: A key-value map for scalable real-time analytics. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 357-369, 2017.
Trevor Brown. Good data structure experiments are r.a.r.e, 2017. URL: https://www.youtube.com/watch?v=x6HaBcRJHFY.
Trevor Brown. Powerful tools for data structure experiments in c++, 2021. URL: https://gitlab.com/trbot86/setbench.
Trevor Brown, Aleksandar Prokopec, and Dan Alistarh. Non-blocking interpolation search trees with doubly-logarithmic running time. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 276-291, 2020.
Trevor Alexander Brown. Reclaiming memory for lock-free data structures: There has to be a better way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, pages 261-270, 2015.
Wikipedia contributors. Xorshift. 2022. URL: https://en.wikipedia.org/wiki/Xorshift.
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 631-644, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2694344.2694359.
Tudor Alexandru David, Rachid Guerraoui, Tong Che, and Vasileios Trigonakis. Designing ascy-compliant concurrent search data structures. Technical report, EPFL Infoscience, 2014.
J. Evans. Scalable memory allocation using jemalloc, 2011. URL: https://www.facebook.com/notes/10158791475077200/.
Vincent Gramoli. More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1-10, 2015.
Vincent Gramoli. The information needed for reproducing shared memory experiments. In European conference on parallel processing, pages 596-608. Springer, 2016.
Rachid Guerraoui and Vasileios Trigonakis. Optimistic concurrency with optik. ACM SIGPLAN Notices, 51(8):1-12, 2016.
Tim Harris. Benchmarking concurrent data structures, 2016. URL: https://timharris.uk/misc/2017-tpc.pdf.
Tim Harris. Do not believe everything you read in the papers, 2016. URL: https://timharris.uk/misc/2016-nicta.pdf.
Gael Hofemeier and Robert Chesebrough. Introduction to intel aes-ni and intel secure key instructions. Intel, White Paper, 62, 2012.
John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. Random number generators for massively parallel simulations on gpu, 1975. URL: https://doi.org/10.1109/SFCS.1975.23.
Nikolaos D Kallimanis. Synch: A framework for concurrent data-structures and benchmarks. arXiv preprint arXiv:2103.16182, 2021.
Rosina Kharal and Trevor Brown. Performance anomalies in concurrent data structure microbenchmarks, 2022. URL: https://doi.org/10.48550/ARXIV.2208.08469.
Onur Kocberber, Babak Falsafi, and Boris Grot. Asynchronous memory access chaining. Proceedings of the VLDB Endowment, 9(4):252-263, 2015.
Petr Kuznetsov and LTCI INFRES. Refining concurrency for perfomance, 2017.
Viktor Leis, Alfons Kemper, and Thomas Neumann. The adaptive radix tree: Artful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 38-49. IEEE, 2013.
Brenton Lessley, Kenneth Moreland, Matthew Larsen, and Hank Childs. Techniques for data-parallel searching for duplicate elements. In 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), pages 1-5, 2017. URL: https://doi.org/10.1109/LDAV.2017.8231845.
Justin J Levandoski, David B Lomet, and Sudipta Sengupta. The bw-tree: A b-tree for new hardware platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 302-313. IEEE, 2013.
M. Manssen, M. Weigel, and A. K. Hartmann. Random number generators for massively parallel simulations on gpu. The European Physical Journal Special Topics, 210(1):53-71, August 2012. URL: https://doi.org/10.1140/epjst/e2012-01637-8.
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM european conference on Computer Systems, pages 183-196, 2012.
George Marsaglia. Xorshift rngs. Journal of Statistical Software, 8:1-6, 2003.
Makoto Matsumoto and Takuji Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3-30, January 1998. URL: https://doi.org/10.1145/272991.272995.
Frank McSherry, Michael Isard, and Derek G Murray. Scalability! but at what COST? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), 2015.
Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. Stamp: Stanford transactional applications for multi-processing. In 2008 IEEE International Symposium on Workload Characterization, pages 35-46. IEEE, 2008.
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. Producing wrong data without doing anything obviously wrong! ACM Sigplan Notices, 44(3):265-276, 2009.
N. Unnikrishnan Nair, P.G. Sankaran, and N. Balakrishnan. Chapter 3 - discrete lifetime models. In N. Unnikrishnan Nair, P.G. Sankaran, and N. Balakrishnan, editors, Reliability Modelling and Analysis in Discrete Time, pages 107-173. Academic Press, Boston, 2018. URL: https://doi.org/10.1016/B978-0-12-801913-9.00003-8.
Morita Naoyuki. Pseudo random number generator with mrg (multiple recursive generator), 2020. URL: https://www.schneier.com/blog/archives/2008/05/random_number_b.html.
Aravind Natarajan and Neeraj Mittal. Fast concurrent lock-free binary search trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 317-328, 2014. URL: https://doi.org/10.1145/2555243.2555256.
Donald Nguyen and Keshav Pingali. What scalable programs need from transactional memory. SIGPLAN Not., 52(4):105-118, April 2017. URL: https://doi.org/10.1145/3093336.3037750.
Ruslan Nikolaev and Binoy Ravindran. Brief Announcement: Crystalline: Fast and Memory Efficient Wait-Free Reclamation. In 35th International Symposium on Distributed Computing (DISC 2021), volume 209 of Leibniz International Proceedings in Informatics (LIPIcs), pages 60:1-60:4. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.DISC.2021.60.
Javier Picorel, Djordje Jevdjic, and Babak Falsafi. Near-memory address translation. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 303-317. Ieee, 2017.
Wenjia Ruan, Yujie Liu, and Michael Spear. Stamp need not be considered harmful. In Ninth ACM SIGPLAN Workshop on Transactional Computing, 2014.
Tomer Shanny and Adam Morrison. Occualizer: Optimistic concurrent search trees from sequential code. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 321-337, 2022.
Gali Sheffi, Maurice Herlihy, and Erez Petrank. Vbr: Version based reclamation. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, pages 443-445, 2021.
Thomas Shrimpton and R Seth Terashima. A provable-security analysis of intel’s secure key rng. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 77-100. Springer, 2015.
Hardy-Francis Simon. Murmurhash2, 2010, 2010. URL: https://simonhf.wordpress.com/2010/09/25/murmurhash160/.
Ajay Singh, Trevor Brown, and Ali Mashtizadeh. Nbr: neutralization based reclamation. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 175-190, 2021.
Ajay Singh and Sathya Peri. Efficient means of Achieving Composability using Transactional Memory. PhD thesis, Indian Institute of Technology Hyderabad, 2017.
Ajay Singh, Sathya Peri, G Monika, and Anila Kumari. Performance comparison of various stm concurrency control protocols using synchrobench. In 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH), pages 1-7. IEEE, 2017.
Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. Collecting performance data with papi-c. In Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel, editors, Tools for High Performance Computing 2009, pages 157-173, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. Building a bw-tree takes more than just buzz words. In Proceedings of the 2018 International Conference on Management of Data, pages 473-488, 2018.
Haosen Wen, Joseph Izraelevitz, Wentao Cai, H Alan Beadle, and Michael L Scott. Interval-based memory reclamation. ACM SIGPLAN Notices, 53(1):1-13, 2018.
Haosen Wen, Joseph Izraelevitz, Wentao Cai, H Alan Beadle, and Michael L Scott. Interval-based memory reclamation. ACM SIGPLAN Notices, 53(1):1-13, 2018.
Xiangyao Yu. An evaluation of concurrency control with one thousand cores. PhD thesis, Massachusetts Institute of Technology, 2015.
Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank. Efficient lock-free durable sets. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1-26, 2019.

Performance Anomalies in Concurrent Data Structure Microbenchmarks

Authors Rosina F. Kharal, Trevor Brown

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Performance Anomalies in Concurrent Data Structure Microbenchmarks

Authors Rosina F. Kharal, Trevor Brown

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

Supplementary Materials

References