Fast and Robust Vectorized In-Place Sorting of Primitive Types

Blacher, Mark; Giesen, Joachim; Kühne, Lars

doi:10.4230/LIPIcs.SEA.2021.3

Abstract

Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process several elements of a primitive data type simultaneously in fixed-size vectors. Classical sorting algorithms are not directly expressible in SIMD instructions. Accelerating sorting algorithms with SIMD instruction is therefore a creative endeavor. A promising approach for sorting with SIMD instructions is to use sorting networks for small arrays and Quicksort for large arrays. In this paper we improve vectorization techniques for sorting networks and Quicksort. In particular, we show how to use the full capacity of vector registers in sorting networks and how to make vectorized Quicksort robust with respect to different key distributions. To demonstrate the performance of our techniques we implement an in-place hybrid sorting algorithm for the data type int with AVX2 intrinsics. Our implementation is at least 30% faster than state-of-the-art high-performance sorting alternatives.

Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. In-Place Parallel Super Scalar Samplesort (IPSSSSo). In 25th Annual European Symposium on Algorithms (ESA 2017), volume 87 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1-9:14. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.9.
Jakob Andreas Bærentzen, Jens Gravesen, François Anton, and Henrik Aanæs. Guide to Computational Geometry Processing. Springer, 2012. URL: https://doi.org/10.1007/978-1-4471-4075-7.
Kenneth E. Batcher. Sorting networks and their applications. In Proceedings of the April 30-May 2, 1968, Spring Joint Computer Conference, pages 307-314. ACM, 1968. URL: https://doi.org/10.1145/1468075.1468121.
BLAS. Basic linear algebra subprograms. URL: http://www.netlib.org/blas/.
Berenger Bramas. A novel hybrid quicksort algorithm vectorized using avx-512 on intel skylake. International Journal of Advanced Computer Science and Applications, 8(10), 2017. URL: https://doi.org/10.14569/IJACSA.2017.081044.
Randal Bryant. Computer Systems: A Programmer’s Perspective. Pearson, 3rd edition, 2016.
Shi-Kuo Chang. Data Structures and Algorithms. World Scientific, 2003.
Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. Proceedings of the VLDB Endowment, 1(2):1313-1324, 2008. URL: https://doi.org/10.14778/1454159.1454171.
Michael Codish, Luís Cruz-Filipe, Thorsten Ehlers, Mike Müller, and Peter Schneider-Kamp. Sorting networks: To the end and back again. Journal of Computer and System Sciences, 2016. URL: https://doi.org/10.1016/j.jcss.2016.04.004.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT press, 3rd edition, 2009.
Amjad M. Daoud, Hussein Abdel-jaber, and Jafar Ababneh. Efficient non-quadratic quick sort (nqquicksort). In Ezendu Ariwa and Eyas El-Qawasmeh, editors, Digital Enterprise and Information Systems, pages 667-675. Springer, 2011.
Developer Zone. Intelsuperscript registered Software Developer Zone, 2008. URL: https://software.intel.com/en-us/articles/measuring-instruction-latency-and-throughput.
Jay Devore and Kenneth Berk. Modern Mathematical Statistics With Applications. Springer, 2012.
Jack J. Dongarra, Fred G. Gustavson, and Alan H. Karp. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26(1):91-112, 1984. URL: https://doi.org/10.1137/1026003.
Michael Griebel. Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications. Springer, 2007.
Shay Gueron and Vlad Krasnov. Fast quicksort implementation using avx instructions. The Computer Journal, 59(1):83-90, 2016. URL: https://doi.org/10.1093/comjnl/bxv063.
David R. Helman, David A. Bader, and Joseph JáJá. A randomized parallel sorting algorithm with an experimental study. Journal of Parallel and Distributed Computing, 52(1):1-23, 1998. URL: https://doi.org/10.1006/jpdc.1998.1462.
Michael Herf. Radix tricks, 2001. URL: http://stereopsis.com/radix.html.
Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani. Aa-sort: A new parallel sorting algorithm for multi-core simd processors. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pages 189-198, 2007. URL: https://doi.org/10.1109/PACT.2007.4336211.
Hiroshi Inoue and Kenjiro Taura. Simd- and cache-friendly algorithm for sorting an array of structures. Proceedings of the VLDB Endowment, 8(11):1274-1285, 2015. URL: https://doi.org/10.14778/2809974.2809988.
Intel Corporation. Developer Reference for Intelsuperscript registered Integrated Performance Primitives. URL: https://software.intel.com/en-us/ipp-dev-reference.
Intel Corporation. Intelsuperscript registered 64 and IA-32 Architectures Software Developer’s Manual, 2016.
Intel Corporation. Intelsuperscriptregistered Integrated Performance Primitives Cryptography: Developer Guide, 2020. URL: https://software.intel.com/sites/default/files/ippcp-devguide.pdf.
Intrinsics Guide. Intel intrinsics guide. URL: https://software.intel.com/sites/landingpage/IntrinsicsGuide/.
Adrian Kaehler and Gary Bradski. Learning OpenCV 3: Computer Vision in C++ With the OpenCV Library. O'Reilly Media, 2016.
Ronald Kneusel. Random Numbers and Computers. Springer, 1st edition, 2018.
Donald E. Knuth. The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley, 2nd edition, 1998.
Bernhard Korte and Jens Vygen. Combinatorial Optimization: Theory and Algorithms. Springer, 2006.
Ibrahim Küçük. Astrophysics. IntechOpen, 2012.
Tapio Lahdenmäki and Michael Leach. Relational Database Index Design and the Optimizers. John Wiley & Sons, 2005.
Stewart A. Levin. A fully vectorized quicksort. Parallel Computing, 16:369-373, 1990. URL: https://doi.org/10.1016/0167-8191(90)90074-J.
Peng Li, David J. Lilja, Weikang Qian, Kia Bazargan, and Marc D. Riedel. Computation on stochastic bit streams digital image processing case studies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(3):449-462, 2014. URL: https://doi.org/10.1109/TVLSI.2013.2247429.
Michal Ozery-Flato and Ron Shamir. Sorting by translocations via reversals theory. In Comparative Genomics, pages 87-98. Springer, 2006. URL: https://doi.org/10.1007/11864127_8.
Ian Parberry. The pairwise sorting network. Parallel Processing Letters, 2:205-211, 1992. URL: https://doi.org/10.1142/S0129626492000337.
Steven Ross. Boost.sort. URL: https://www.boost.org/doc/libs/1_67_0/libs/sort/doc/html/index.html.
Robert Sedgewick and Kevin Wayne. Algorithms. Addison-Wesley, 4th edition, 2011.
Howard Jay Siegel. The universality of various types of simd machine interconnection networks. SIGARCH Comput. Archit. News, 5(7):70-79, 1977. URL: https://doi.org/10.1145/633615.810655.
Harold S. Stone. Sorting on star. IEEE Transactions on Software Engineering, SE-4(2):138-146, 1978. URL: https://doi.org/10.1109/TSE.1978.231484.
Martin Weisser. Essential Programming for Linguistics. Edinburgh University Press, 2009.
Zekun Yin, Tianyu Zhang, André Müller, Hui Liu, Yanjie Wei, Bertil Schmidt, and Weiguo Liu. Efficient parallel sort on avx-512-based multi-core and many-core architectures. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 168-176. IEEE, 2019. URL: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00038.
Marco Zagha and Guy E. Blelloch. Radix sort for vector multiprocessors. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 712-721. ACM, 1991. URL: https://doi.org/10.1145/125826.126164.

Fast and Robust Vectorized In-Place Sorting of Primitive Types

Authors Mark Blacher, Joachim Giesen, Lars Kühne

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Fast and Robust Vectorized In-Place Sorting of Primitive Types

Authors Mark Blacher, Joachim Giesen, Lars Kühne

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message