In-Place Parallel Super Scalar Samplesort (IPSSSSo)

Authors Michael Axtmann, Sascha Witt, Daniel Ferizovic, Peter Sanders

Thumbnail PDF


  • Filesize: 0.91 MB
  • 14 pages

Document Identifiers

Author Details

Michael Axtmann
Sascha Witt
Daniel Ferizovic
Peter Sanders

Cite AsGet BibTex

Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. In-Place Parallel Super Scalar Samplesort (IPSSSSo). In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new ways to make distribution-based algorithms in-place: On the practical side, by using coarse-grained block-based permutations, and on the theoretical side, we show how to eliminate the recursion stack. Extensive experiments shw that our algorithm IPSSSSo scales well on a variety of multi-core machines. We outperform our closest in-place competitor by a factor of up to 3. Even as a sequential algorithm, we are up to 1.5 times faster than the closest sequential competitor, BlockQuicksort.
  • shared memory
  • parallel sorting
  • in-place algorithm
  • comparison-based sorting
  • branch prediction


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Lars Arge, Michael T Goodrich, Michael Nelson, and Nodari Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In 20th Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 197-206. ACM, 2008. Google Scholar
  2. Michael Axtmann, Timo Bingmann, Peter Sanders, and Christian Schulz. Practical massively parallel sorting. In 27th ACM Symposium on Parallelism in Algorithms and Architectures, (SPAA), 2015. URL:
  3. Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. In-place parallel super scalar samplesort (IPSSSSo). arXiv preprint arXiv:1705.02257, 2017. URL:
  4. Timo Bingmann and Peter Sanders. Parallel string sample sort. In European Symposium on Algorithms, pages 169-180. Springer, 2013. Google Scholar
  5. Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. Low depth cache-oblivious algorithms. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures, pages 189-199. ACM, 2010. Google Scholar
  6. Guy E. Blelloch, Charles E. Leiserson, Bruce M Maggs, C. Greg Plaxton, Stephen J. Smith, and Marco Zagha. A comparison of sorting algorithms for the connection machine CM-2. In Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, pages 3-16. ACM, 1991. Google Scholar
  7. G. S. Brodal, R. Fagerberg, and K. Vinther. Engineering a cache-oblivious sorting algorithm. In 6th Workshop on Algorithm Engineering and Experiments, 2004. Google Scholar
  8. Jing-Chao Chen. A simple algorithm for in-place merging. Information Processing Letters, 98(1):34-40, 2006. URL:
  9. Minsik Cho, Daniel Brand, Rajesh Bordawekar, Ulrich Finkler, Vincent Kulandaisamy, and Ruchir Puri. PARADIS: an efficient parallel algorithm for in-place radix sort. Proceedings of the VLDB Endowment, 8(12):1518-1529, 2015. Google Scholar
  10. Stefan Edelkamp and Armin Weiss. BlockQuicksort: Avoiding branch mispredictions in quicksort. In 24th European Symposium on Algorithms (ESA), volume 57 of LIPIcs, 2016. Google Scholar
  11. Gianni Franceschini. Proximity mergesort: Optimal in-place sorting in the cache-oblivious model. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'04, pages 291-299, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. URL:
  12. Gianni Franceschini and Viliam Geffert. An in-place sorting with O(N log N) comparisons and O(N) moves. J. ACM, 52(4):515-537, July 2005. URL:
  13. W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. J. ACM, 17(3):496-507, July 1970. URL:
  14. Viliam Geffert and Jozef Gajdoš. Multiway in-place merging. In Mirosław Kutyłowski, Witold Charatonik, and Maciej Gębala, editors, 17th Symposium on Fundamentals of Computation Theory (FCT), volume 5699 of LNCS, pages 133-144. Springer, 2009. URL:
  15. Charles A. R. Hoare. Quicksort. The Computer Journal, 5(1):10-16, 1962. Google Scholar
  16. Lorenz Hübschle-Schneider. Super scalar sample sort., retrieved September 15, 2016.
  17. Tomasz Jurkiewicz and Kurt Mehlhorn. On a model of virtual address translation. Journal of Experimental Algorithmics (JEA), 19, 2015. Google Scholar
  18. K. Kaligosi and P. Sanders. How branch mispredictions affect quicksort. In 14th European Symposium on Algorithms (ESA), volume 4168 of LNCS, pages 780-791, 2006. Google Scholar
  19. Kanela Kaligosi and Peter Sanders. How branch mispredictions affect quicksort. In European Symposium on Algorithms, pages 780-791. Springer, 2006. Google Scholar
  20. Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort. Nord. J. Comput., 3(1):27-40, 1996. Google Scholar
  21. Marek Kokot, Sebastian Deorowicz, and Maciej Dlugosz. Even faster sorting of (not only) integers. arXiv preprint arXiv:1703.00687, 2017. Google Scholar
  22. Shrinu Kushagra, Alejandro López-Ortiz, J. Ian Munro, and Aurick Qiao. Multi-pivot quicksort: Theory and experiments. In Meeting on Algorithm Engineering &Experiments (ALENEX), pages 47-60, Philadelphia, PA, USA, 2014. AMS. URL:
  23. David R. Musser. Introspective sorting and selection algorithms. Softw., Pract. Exper., 27(8):983-993, 1997. Google Scholar
  24. N. Rahman. Algorithms for Memory Hierarchies, volume 2625 of LNCS, chapter Algorithms for Hardware Caches and TLB, pages 171-192. Springer, 2003. Google Scholar
  25. James Reinders. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O'Reilly Media, Inc., 2007. Google Scholar
  26. Peter Sanders and Jan Wassenberg. Engineering a multi-core radix sort. In 17th Euro-Par Conference, volume 6853 of LNCS, pages 160-169. Springer, 2011. Google Scholar
  27. Peter Sanders and Sebastian Winkel. Super scalar sample sort. In 12th European Symposium on Algorithms (ESA), volume 3221 of LNCS, pages 784-796. Springer, 2004. Google Scholar
  28. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. Brief announcement: the problem based benchmark suite. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, pages 68-70. ACM, 2012. Google Scholar
  29. J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In 13th Euro-Par Conference, volume 4641 of LNCS, pages 682-694. Springer, 2007. URL:
  30. P. Tsigas and Y. Zhang. A simple, fast parallel implementation of quicksort and its performance evaluation on SUN Enterprise 10000. In PDP, pages 372-381. IEEE Computer Society, 2003. URL:
  31. Vladimir Yaroslavskiy. Dual-pivot quicksort. Research Disclosure, 2009. Google Scholar