Fast and Robust Vectorized In-Place Sorting of Primitive Types

Authors Mark Blacher, Joachim Giesen, Lars Kühne



PDF
Thumbnail PDF

File

LIPIcs.SEA.2021.3.pdf
  • Filesize: 0.99 MB
  • 16 pages

Document Identifiers

Author Details

Mark Blacher
  • Institute for Theoretical Computer Science, Friedrich Schiller Universität Jena, Germany
Joachim Giesen
  • Institute for Theoretical Computer Science, Friedrich Schiller Universität Jena, Germany
Lars Kühne
  • German Aerospace Center (DLR), Jena, Germany

Cite AsGet BibTex

Mark Blacher, Joachim Giesen, and Lars Kühne. Fast and Robust Vectorized In-Place Sorting of Primitive Types. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SEA.2021.3

Abstract

Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process several elements of a primitive data type simultaneously in fixed-size vectors. Classical sorting algorithms are not directly expressible in SIMD instructions. Accelerating sorting algorithms with SIMD instruction is therefore a creative endeavor. A promising approach for sorting with SIMD instructions is to use sorting networks for small arrays and Quicksort for large arrays. In this paper we improve vectorization techniques for sorting networks and Quicksort. In particular, we show how to use the full capacity of vector registers in sorting networks and how to make vectorized Quicksort robust with respect to different key distributions. To demonstrate the performance of our techniques we implement an in-place hybrid sorting algorithm for the data type int with AVX2 intrinsics. Our implementation is at least 30% faster than state-of-the-art high-performance sorting alternatives.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • Quicksort
  • Sorting Networks
  • Vectorization
  • SIMD
  • AVX2

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. In-Place Parallel Super Scalar Samplesort (IPSSSSo). In 25th Annual European Symposium on Algorithms (ESA 2017), volume 87 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1-9:14. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.9.
  2. Jakob Andreas Bærentzen, Jens Gravesen, François Anton, and Henrik Aanæs. Guide to Computational Geometry Processing. Springer, 2012. URL: https://doi.org/10.1007/978-1-4471-4075-7.
  3. Kenneth E. Batcher. Sorting networks and their applications. In Proceedings of the April 30-May 2, 1968, Spring Joint Computer Conference, pages 307-314. ACM, 1968. URL: https://doi.org/10.1145/1468075.1468121.
  4. BLAS. Basic linear algebra subprograms. URL: http://www.netlib.org/blas/.
  5. Berenger Bramas. A novel hybrid quicksort algorithm vectorized using avx-512 on intel skylake. International Journal of Advanced Computer Science and Applications, 8(10), 2017. URL: https://doi.org/10.14569/IJACSA.2017.081044.
  6. Randal Bryant. Computer Systems: A Programmer’s Perspective. Pearson, 3rd edition, 2016. Google Scholar
  7. Shi-Kuo Chang. Data Structures and Algorithms. World Scientific, 2003. Google Scholar
  8. Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. Proceedings of the VLDB Endowment, 1(2):1313-1324, 2008. URL: https://doi.org/10.14778/1454159.1454171.
  9. Michael Codish, Luís Cruz-Filipe, Thorsten Ehlers, Mike Müller, and Peter Schneider-Kamp. Sorting networks: To the end and back again. Journal of Computer and System Sciences, 2016. URL: https://doi.org/10.1016/j.jcss.2016.04.004.
  10. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT press, 3rd edition, 2009. Google Scholar
  11. Amjad M. Daoud, Hussein Abdel-jaber, and Jafar Ababneh. Efficient non-quadratic quick sort (nqquicksort). In Ezendu Ariwa and Eyas El-Qawasmeh, editors, Digital Enterprise and Information Systems, pages 667-675. Springer, 2011. Google Scholar
  12. Developer Zone. Intelsuperscript registered Software Developer Zone, 2008. URL: https://software.intel.com/en-us/articles/measuring-instruction-latency-and-throughput.
  13. Jay Devore and Kenneth Berk. Modern Mathematical Statistics With Applications. Springer, 2012. Google Scholar
  14. Jack J. Dongarra, Fred G. Gustavson, and Alan H. Karp. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26(1):91-112, 1984. URL: https://doi.org/10.1137/1026003.
  15. Michael Griebel. Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications. Springer, 2007. Google Scholar
  16. Shay Gueron and Vlad Krasnov. Fast quicksort implementation using avx instructions. The Computer Journal, 59(1):83-90, 2016. URL: https://doi.org/10.1093/comjnl/bxv063.
  17. David R. Helman, David A. Bader, and Joseph JáJá. A randomized parallel sorting algorithm with an experimental study. Journal of Parallel and Distributed Computing, 52(1):1-23, 1998. URL: https://doi.org/10.1006/jpdc.1998.1462.
  18. Michael Herf. Radix tricks, 2001. URL: http://stereopsis.com/radix.html.
  19. Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani. Aa-sort: A new parallel sorting algorithm for multi-core simd processors. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pages 189-198, 2007. URL: https://doi.org/10.1109/PACT.2007.4336211.
  20. Hiroshi Inoue and Kenjiro Taura. Simd- and cache-friendly algorithm for sorting an array of structures. Proceedings of the VLDB Endowment, 8(11):1274-1285, 2015. URL: https://doi.org/10.14778/2809974.2809988.
  21. Intel Corporation. Developer Reference for Intelsuperscript registered Integrated Performance Primitives. URL: https://software.intel.com/en-us/ipp-dev-reference.
  22. Intel Corporation. Intelsuperscript registered 64 and IA-32 Architectures Software Developer’s Manual, 2016. Google Scholar
  23. Intel Corporation. Intelsuperscriptregistered Integrated Performance Primitives Cryptography: Developer Guide, 2020. URL: https://software.intel.com/sites/default/files/ippcp-devguide.pdf.
  24. Intrinsics Guide. Intel intrinsics guide. URL: https://software.intel.com/sites/landingpage/IntrinsicsGuide/.
  25. Adrian Kaehler and Gary Bradski. Learning OpenCV 3: Computer Vision in C++ With the OpenCV Library. O'Reilly Media, 2016. Google Scholar
  26. Ronald Kneusel. Random Numbers and Computers. Springer, 1st edition, 2018. Google Scholar
  27. Donald E. Knuth. The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley, 2nd edition, 1998. Google Scholar
  28. Bernhard Korte and Jens Vygen. Combinatorial Optimization: Theory and Algorithms. Springer, 2006. Google Scholar
  29. Ibrahim Küçük. Astrophysics. IntechOpen, 2012. Google Scholar
  30. Tapio Lahdenmäki and Michael Leach. Relational Database Index Design and the Optimizers. John Wiley & Sons, 2005. Google Scholar
  31. Stewart A. Levin. A fully vectorized quicksort. Parallel Computing, 16:369-373, 1990. URL: https://doi.org/10.1016/0167-8191(90)90074-J.
  32. Peng Li, David J. Lilja, Weikang Qian, Kia Bazargan, and Marc D. Riedel. Computation on stochastic bit streams digital image processing case studies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(3):449-462, 2014. URL: https://doi.org/10.1109/TVLSI.2013.2247429.
  33. Michal Ozery-Flato and Ron Shamir. Sorting by translocations via reversals theory. In Comparative Genomics, pages 87-98. Springer, 2006. URL: https://doi.org/10.1007/11864127_8.
  34. Ian Parberry. The pairwise sorting network. Parallel Processing Letters, 2:205-211, 1992. URL: https://doi.org/10.1142/S0129626492000337.
  35. Steven Ross. Boost.sort. URL: https://www.boost.org/doc/libs/1_67_0/libs/sort/doc/html/index.html.
  36. Robert Sedgewick and Kevin Wayne. Algorithms. Addison-Wesley, 4th edition, 2011. Google Scholar
  37. Howard Jay Siegel. The universality of various types of simd machine interconnection networks. SIGARCH Comput. Archit. News, 5(7):70-79, 1977. URL: https://doi.org/10.1145/633615.810655.
  38. Harold S. Stone. Sorting on star. IEEE Transactions on Software Engineering, SE-4(2):138-146, 1978. URL: https://doi.org/10.1109/TSE.1978.231484.
  39. Martin Weisser. Essential Programming for Linguistics. Edinburgh University Press, 2009. Google Scholar
  40. Zekun Yin, Tianyu Zhang, André Müller, Hui Liu, Yanjie Wei, Bertil Schmidt, and Weiguo Liu. Efficient parallel sort on avx-512-based multi-core and many-core architectures. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 168-176. IEEE, 2019. URL: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00038.
  41. Marco Zagha and Guy E. Blelloch. Radix sort for vector multiprocessors. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 712-721. ACM, 1991. URL: https://doi.org/10.1145/125826.126164.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail