Performance Anomalies in Concurrent Data Structure Microbenchmarks

Authors Rosina F. Kharal, Trevor Brown



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2022.7.pdf
  • Filesize: 1.71 MB
  • 24 pages

Document Identifiers

Author Details

Rosina F. Kharal
  • University of Waterloo, Canada
Trevor Brown
  • University of Waterloo, Canada

Acknowledgements

We thank the reviewers for their helpful comments and suggestions.

Cite AsGet BibTex

Rosina F. Kharal and Trevor Brown. Performance Anomalies in Concurrent Data Structure Microbenchmarks. In 26th International Conference on Principles of Distributed Systems (OPODIS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 253, pp. 7:1-7:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.OPODIS.2022.7

Abstract

Recent decades have witnessed a surge in the development of concurrent data structures with an increasing interest in data structures implementing concurrent sets (CSets). Microbenchmarking tools are frequently utilized to evaluate and compare the performance differences across concurrent data structures. The underlying structure and design of the microbenchmarks themselves can play a hidden but influential role in performance results. However, the impact of microbenchmark design has not been well investigated. In this work, we illustrate instances where concurrent data structure performance results reported by a microbenchmark can vary 10-100x depending on the microbenchmark implementation details. We investigate factors leading to performance variance across three popular microbenchmarks and outline cases in which flawed microbenchmark design can lead to an inversion of performance results between two concurrent data structure implementations. We further derive a set of recommendations for best practices in the design and usage of concurrent data structure microbenchmarks and explore advanced features in the Setbench microbenchmark.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Concurrent computing methodologies
  • Theory of computation → Concurrency
  • Computing methodologies → Massively parallel and high-performance simulations
Keywords
  • concurrent microbenchmarks
  • concurrent data structures
  • concurrent performance evaluation
  • PRNGs
  • parallel computing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Austin Appleby. Murmurhash3, 2012. URL: https://github.com/aappleby/smhasher/wiki/MurmurHash3.
  2. Maya Arbel-Raviv and Trevor Brown. Harnessing epoch-based reclamation for efficient range queries. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '18, pages 14-27, New York, NY, USA, 2018. Association for Computing Machinery. URL: https://doi.org/10.1145/3178487.3178489.
  3. Maya Arbel-Raviv, Trevor Brown, and Adam Morrison. Getting to the root of concurrent binary search tree performance. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 295-306, 2018. Google Scholar
  4. Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson. Bztree: A high-performance latch-free range index for non-volatile memory. Proceedings of the VLDB Endowment, 11(5):553-565, 2018. Google Scholar
  5. Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. Kiwi: A key-value map for scalable real-time analytics. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 357-369, 2017. Google Scholar
  6. Trevor Brown. Good data structure experiments are r.a.r.e, 2017. URL: https://www.youtube.com/watch?v=x6HaBcRJHFY.
  7. Trevor Brown. Powerful tools for data structure experiments in c++, 2021. URL: https://gitlab.com/trbot86/setbench.
  8. Trevor Brown, Aleksandar Prokopec, and Dan Alistarh. Non-blocking interpolation search trees with doubly-logarithmic running time. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 276-291, 2020. Google Scholar
  9. Trevor Alexander Brown. Reclaiming memory for lock-free data structures: There has to be a better way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, pages 261-270, 2015. Google Scholar
  10. Wikipedia contributors. Xorshift. 2022. URL: https://en.wikipedia.org/wiki/Xorshift.
  11. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 631-644, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2694344.2694359.
  12. Tudor Alexandru David, Rachid Guerraoui, Tong Che, and Vasileios Trigonakis. Designing ascy-compliant concurrent search data structures. Technical report, EPFL Infoscience, 2014. Google Scholar
  13. J. Evans. Scalable memory allocation using jemalloc, 2011. URL: https://www.facebook.com/notes/10158791475077200/.
  14. Vincent Gramoli. More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1-10, 2015. Google Scholar
  15. Vincent Gramoli. The information needed for reproducing shared memory experiments. In European conference on parallel processing, pages 596-608. Springer, 2016. Google Scholar
  16. Rachid Guerraoui and Vasileios Trigonakis. Optimistic concurrency with optik. ACM SIGPLAN Notices, 51(8):1-12, 2016. Google Scholar
  17. Tim Harris. Benchmarking concurrent data structures, 2016. URL: https://timharris.uk/misc/2017-tpc.pdf.
  18. Tim Harris. Do not believe everything you read in the papers, 2016. URL: https://timharris.uk/misc/2016-nicta.pdf.
  19. Gael Hofemeier and Robert Chesebrough. Introduction to intel aes-ni and intel secure key instructions. Intel, White Paper, 62, 2012. Google Scholar
  20. John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. Random number generators for massively parallel simulations on gpu, 1975. URL: https://doi.org/10.1109/SFCS.1975.23.
  21. Nikolaos D Kallimanis. Synch: A framework for concurrent data-structures and benchmarks. arXiv preprint arXiv:2103.16182, 2021. Google Scholar
  22. Rosina Kharal and Trevor Brown. Performance anomalies in concurrent data structure microbenchmarks, 2022. URL: https://doi.org/10.48550/ARXIV.2208.08469.
  23. Onur Kocberber, Babak Falsafi, and Boris Grot. Asynchronous memory access chaining. Proceedings of the VLDB Endowment, 9(4):252-263, 2015. Google Scholar
  24. Petr Kuznetsov and LTCI INFRES. Refining concurrency for perfomance, 2017. Google Scholar
  25. Viktor Leis, Alfons Kemper, and Thomas Neumann. The adaptive radix tree: Artful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 38-49. IEEE, 2013. Google Scholar
  26. Brenton Lessley, Kenneth Moreland, Matthew Larsen, and Hank Childs. Techniques for data-parallel searching for duplicate elements. In 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV), pages 1-5, 2017. URL: https://doi.org/10.1109/LDAV.2017.8231845.
  27. Justin J Levandoski, David B Lomet, and Sudipta Sengupta. The bw-tree: A b-tree for new hardware platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 302-313. IEEE, 2013. Google Scholar
  28. M. Manssen, M. Weigel, and A. K. Hartmann. Random number generators for massively parallel simulations on gpu. The European Physical Journal Special Topics, 210(1):53-71, August 2012. URL: https://doi.org/10.1140/epjst/e2012-01637-8.
  29. Yandong Mao, Eddie Kohler, and Robert Tappan Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM european conference on Computer Systems, pages 183-196, 2012. Google Scholar
  30. George Marsaglia. Xorshift rngs. Journal of Statistical Software, 8:1-6, 2003. Google Scholar
  31. Makoto Matsumoto and Takuji Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3-30, January 1998. URL: https://doi.org/10.1145/272991.272995.
  32. Frank McSherry, Michael Isard, and Derek G Murray. Scalability! but at what COST? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), 2015. Google Scholar
  33. Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. Stamp: Stanford transactional applications for multi-processing. In 2008 IEEE International Symposium on Workload Characterization, pages 35-46. IEEE, 2008. Google Scholar
  34. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. Producing wrong data without doing anything obviously wrong! ACM Sigplan Notices, 44(3):265-276, 2009. Google Scholar
  35. N. Unnikrishnan Nair, P.G. Sankaran, and N. Balakrishnan. Chapter 3 - discrete lifetime models. In N. Unnikrishnan Nair, P.G. Sankaran, and N. Balakrishnan, editors, Reliability Modelling and Analysis in Discrete Time, pages 107-173. Academic Press, Boston, 2018. URL: https://doi.org/10.1016/B978-0-12-801913-9.00003-8.
  36. Morita Naoyuki. Pseudo random number generator with mrg (multiple recursive generator), 2020. URL: https://www.schneier.com/blog/archives/2008/05/random_number_b.html.
  37. Aravind Natarajan and Neeraj Mittal. Fast concurrent lock-free binary search trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 317-328, 2014. URL: https://doi.org/10.1145/2555243.2555256.
  38. Donald Nguyen and Keshav Pingali. What scalable programs need from transactional memory. SIGPLAN Not., 52(4):105-118, April 2017. URL: https://doi.org/10.1145/3093336.3037750.
  39. Ruslan Nikolaev and Binoy Ravindran. Brief Announcement: Crystalline: Fast and Memory Efficient Wait-Free Reclamation. In 35th International Symposium on Distributed Computing (DISC 2021), volume 209 of Leibniz International Proceedings in Informatics (LIPIcs), pages 60:1-60:4. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.DISC.2021.60.
  40. Javier Picorel, Djordje Jevdjic, and Babak Falsafi. Near-memory address translation. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 303-317. Ieee, 2017. Google Scholar
  41. Wenjia Ruan, Yujie Liu, and Michael Spear. Stamp need not be considered harmful. In Ninth ACM SIGPLAN Workshop on Transactional Computing, 2014. Google Scholar
  42. Tomer Shanny and Adam Morrison. Occualizer: Optimistic concurrent search trees from sequential code. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 321-337, 2022. Google Scholar
  43. Gali Sheffi, Maurice Herlihy, and Erez Petrank. Vbr: Version based reclamation. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, pages 443-445, 2021. Google Scholar
  44. Thomas Shrimpton and R Seth Terashima. A provable-security analysis of intel’s secure key rng. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 77-100. Springer, 2015. Google Scholar
  45. Hardy-Francis Simon. Murmurhash2, 2010, 2010. URL: https://simonhf.wordpress.com/2010/09/25/murmurhash160/.
  46. Ajay Singh, Trevor Brown, and Ali Mashtizadeh. Nbr: neutralization based reclamation. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 175-190, 2021. Google Scholar
  47. Ajay Singh and Sathya Peri. Efficient means of Achieving Composability using Transactional Memory. PhD thesis, Indian Institute of Technology Hyderabad, 2017. Google Scholar
  48. Ajay Singh, Sathya Peri, G Monika, and Anila Kumari. Performance comparison of various stm concurrency control protocols using synchrobench. In 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH), pages 1-7. IEEE, 2017. Google Scholar
  49. Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. Collecting performance data with papi-c. In Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel, editors, Tools for High Performance Computing 2009, pages 157-173, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. Google Scholar
  50. Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. Building a bw-tree takes more than just buzz words. In Proceedings of the 2018 International Conference on Management of Data, pages 473-488, 2018. Google Scholar
  51. Haosen Wen, Joseph Izraelevitz, Wentao Cai, H Alan Beadle, and Michael L Scott. Interval-based memory reclamation. ACM SIGPLAN Notices, 53(1):1-13, 2018. Google Scholar
  52. Haosen Wen, Joseph Izraelevitz, Wentao Cai, H Alan Beadle, and Michael L Scott. Interval-based memory reclamation. ACM SIGPLAN Notices, 53(1):1-13, 2018. Google Scholar
  53. Xiangyao Yu. An evaluation of concurrency control with one thousand cores. PhD thesis, Massachusetts Institute of Technology, 2015. Google Scholar
  54. Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank. Efficient lock-free durable sets. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1-26, 2019. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail