Engineering Shared-Memory Parallel Shuffling to Generate Random Permutations In-Place

Author Manuel Penschuck

Thumbnail PDF


  • Filesize: 1.11 MB
  • 20 pages

Document Identifiers

Author Details

Manuel Penschuck
  • Goethe Universität Frankfurt, Germany

Cite AsGet BibTex

Manuel Penschuck. Engineering Shared-Memory Parallel Shuffling to Generate Random Permutations In-Place. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Shuffling is the process of placing elements into a random order such that any permutation occurs with equal probability. It is an important building block in virtually all scientific areas. We engineer, - to the best of our knowledge - for the first time, a practically fast, parallel shuffling algorithm with O(√n log n) parallel depth that requires only poly-logarithmic auxiliary memory (with high probability). In an empirical evaluation, we compare our implementations with a number of existing solutions on various computer architectures. Our algorithms consistently achieve the highest through-put on all machines. Further, we demonstrate that the runtime of our parallel algorithm is comparable to the time that other algorithms may take to acquire the memory from the operating system to copy the input.

Subject Classification

ACM Subject Classification
  • Theory of computation → Shared memory algorithms
  • Shuffling
  • random permutation
  • parallelism
  • in-place
  • algorithm engineering
  • practical implementation


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Daniel Allendorf, Ulrich Meyer, Manuel Penschuck, Hung Tran, and Nick Wormald. Engineering uniform sampling of graphs with a prescribed power-law degree sequence. In ALENEX, pages 27-40. SIAM, 2022. Google Scholar
  2. Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. Engineering in-place (shared-memory) sorting algorithms. ACM Trans. Parallel Comput., 9(1):2:1-2:62, 2022. Google Scholar
  3. Axel Bacher, Olivier Bodini, Alexandros Hollender, and Jérémie O. Lumbroso. Mergeshuffle: a very fast, parallel random permutation algorithm. In GASCom, volume 2113 of CEUR Workshop Proceedings, pages 43-52., 2018. Google Scholar
  4. Edward A. Bender and E. Rodney Canfield. The asymptotic number of labeled graphs with given degree sequences. J. Comb. Theory, Ser. A, 24(3):296-307, 1978. Google Scholar
  5. Jon Louis Bentley, Dorothea Haken, and James B. Saxe. A general method for solving divide-and-conquer recurrences. SIGACT News, 12(3):36-44, 1980. Google Scholar
  6. Petra Berenbrink, David Hammer, Dominik Kaaser, Ulrich Meyer, Manuel Penschuck, and Hung Tran. Simulating population protocols in sub-constant time per interaction. In ESA, volume 173 of LIPIcs, pages 16:1-16:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  7. Guojing Cong and David A. Bader. An empirical analysis of parallel random permutation algorithms ON smps. In PDCS, pages 27-34. ISCA, 2005. Google Scholar
  8. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. Google Scholar
  9. Luc Devroye. Non-Uniform Random Variate Generation. Springer, 1986. Google Scholar
  10. Edsger W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976. Google Scholar
  11. Thomas S. Ferguson. Who solved the secretary problem? Stat. Sci., 4(3):282-289, 1989. Google Scholar
  12. Agner Fog. Instruction tables. URL:
  13. Daniel Funke, Sebastian Lamm, Ulrich Meyer, Manuel Penschuck, Peter Sanders, Christian Schulz, Darren Strash, and Moritz von Looz. Communication-free massively distributed graph generation. J. Parallel Distributed Comput., 131:200-217, 2019. Google Scholar
  14. Hermann Gruber, Markus Holzer, and Oliver Ruepp. Sorting the slow way: An analysis of perversely awful randomized sorting algorithms. In FUN, volume 4475 of Lecture Notes in Computer Science, pages 183-197. Springer, 2007. Google Scholar
  15. Yan Gu, Omar Obeya, and Julian Shun. Parallel in-place algorithms: Theory and practice. In APOCS, pages 114-128. SIAM, 2021. Google Scholar
  16. Chris Hinrichs, Vamsi K Ithapu, Qinyuan Sun, Sterling C Johnson, and Vikas Singh. Speeding up permutation testing in neuroimaging. In C. J. C. Burges et al., editor, Advances in Neural Information Processing Systems, volume 26, pages 890-898. Curran Associates, Inc., 2013. Google Scholar
  17. Intel Corporation. Intel 64 and ia-32 architectures software developer’s manual, 2022. Google Scholar
  18. Donald E. Knuth. The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd Edition. Addison-Wesley, 1981. Google Scholar
  19. Charles E. Leiserson. Programming irregular parallel applications in cilk. In IRREGULAR, volume 1253 of Lecture Notes in Computer Science, pages 61-71. Springer, 1997. Google Scholar
  20. Daniel Lemire. Fast random integer generation in an interval. ACM Trans. Model. Comput. Simul., 29(1):3:1-3:12, 2019. Google Scholar
  21. Yossi Matias, Jeffrey Scott Vitter, and Wen-Chun Ni. Dynamic generation of discrete random variates. Theory Comput. Syst., 36(4):329-358, 2003. Google Scholar
  22. Peter M. McIlroy, Keith Bostic, and M. Douglas McIlroy. Engineering radix sort. Comput. Syst., 6(1):5-27, 1993. Google Scholar
  23. SJ Meyer. A failure of structured programming, Zilog Corp. Technical report, Software Dept. Technical Report, 1979. Google Scholar
  24. Richard Nixon. Executive order 11497 - amending the selective service regulations to prescribe random selection, 1969. Google Scholar
  25. Melissa E. O'Neill. Pcg: A family of simple fast space-efficient statistically good algorithms for random number generation. Technical Report HMC-CS-2014-0905, Harvey Mudd College, Claremont, CA, September 2014. Google Scholar
  26. Manuel Penschuck, Ulrik Brandes, Michael Hamann, Sebastian Lamm, Ulrich Meyer, Ilya Safro, Peter Sanders, and Christian Schulz. Recent advances in scalable network generation. CoRR, abs/2003.00736, 2020. Google Scholar
  27. Martin Raab and Angelika Steger. "Balls into bins" - A simple and tight analysis. In RANDOM, volume 1518 of Lecture Notes in Computer Science, pages 159-170. Springer, 1998. Google Scholar
  28. Matthew Route. Radio-flaring ultracool dwarf population synthesis. The Astrophysical Journal, 845(1):66, August 2017. URL:
  29. Peter Sanders. Random permutations on distributed, external and hierarchical memory. Inf. Process. Lett., 67(6):305-309, 1998. Google Scholar
  30. Julian Shun, Yan Gu, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In SODA, pages 431-448. SIAM, 2015. Google Scholar
  31. Johannes Singler and Benjamin Konsik. The GNU libstdc++ parallel mode: software engineering considerations. In IWMSE@ICSE, pages 15-22. ACM, 2008. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail