Scalable Distributed String Sorting

Authors Florian Kurpicz , Pascal Mehnert , Peter Sanders , Matthias Schimek



PDF
Thumbnail PDF

File

LIPIcs.ESA.2024.83.pdf
  • Filesize: 1.12 MB
  • 17 pages

Document Identifiers

Author Details

Florian Kurpicz
  • Karlsruhe Institute of Technology, Germany
Pascal Mehnert
  • Independent, Germany
Peter Sanders
  • Karlsruhe Institute of Technology, Germany
Matthias Schimek
  • Karlsruhe Institute of Technology, Germany

Cite AsGet BibTex

Florian Kurpicz, Pascal Mehnert, Peter Sanders, and Matthias Schimek. Scalable Distributed String Sorting. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 83:1-83:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ESA.2024.83

Abstract

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least) proportional to the number of processors p or communicate the data a large number of times (at least logarithmic). We present practical and efficient algorithms for distributed-memory string sorting that scale to large p. Similar to state-of-the-art sorters for atomic objects, the algorithms have latency of about p^{1/k} when allowing the data to be communicated k times. Experiments indicate good scaling behavior on a wide range of inputs on up to 49152 cores. Overall, we achieve speedups of up to 4.9 over the current state-of-the-art distributed string sorting algorithms.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
  • Theory of computation → Massively parallel algorithms
  • Computing methodologies → Distributed algorithms
  • Theory of computation → Bloom filters and hashing
Keywords
  • sorting
  • strings
  • distributed-memory computing
  • distributed membership filters
  • scalability

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael Axtmann. Robust Scalable Sorting. PhD thesis, Karlsruhe Institute of Technology, Germany, 2021. URL: https://doi.org/10.5445/IR/1000136621.
  2. Michael Axtmann, Timo Bingmann, Peter Sanders, and Christian Schulz. Practical massively parallel sorting. In SPAA, pages 13-23. ACM, 2015. URL: https://doi.org/10.1145/2755573.2755595.
  3. Michael Axtmann and Peter Sanders. Robust massively parallel sorting. In ALENEX, pages 83-97. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974768.7.
  4. Jon Louis Bentley and Robert Sedgewick. Fast algorithms for sorting and searching strings. In SODA, pages 360-369. ACM/SIAM, 1997. Google Scholar
  5. Timo Bingmann. Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools. PhD thesis, Karlsruhe Institute of Technology, Germany, 2018. URL: https://doi.org/10.5445/IR/1000085031.
  6. Timo Bingmann, Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Algorithmica, 77(1):235-286, 2017. URL: https://doi.org/10.1007/S00453-015-0071-1.
  7. Timo Bingmann and Peter Sanders. Parallel string sample sort. In ESA, volume 8125 of Lecture Notes in Computer Science, pages 169-180. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40450-4_15.
  8. Timo Bingmann, Peter Sanders, and Matthias Schimek. Communication-efficient string sorting. In IPDPS, pages 137-147. IEEE, 2020. URL: https://doi.org/10.1109/IPDPS47924.2020.00024.
  9. Jonas Ellert, Johannes Fischer, and Nodari Sitchinava. LCP-aware parallel string sorting. In Euro-Par, volume 12247 of Lecture Notes in Computer Science, pages 329-342. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-57675-2_21.
  10. Paolo Ferragina and Roberto Grossi. The string b-tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236-280, 1999. URL: https://doi.org/10.1145/301970.301973.
  11. Johannes Fischer and Florian Kurpicz. Lightweight distributed suffix array construction. In ALENEX, pages 27-38. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975499.3.
  12. Torben Hagerup. Optimal parallel string algorithms: sorting, merging and computing the minimum. In STOC, pages 382-391. ACM, 1994. URL: https://doi.org/10.1145/195058.195202.
  13. D. Hespe, L. Hübner, F. Kurpicz, P. Sanders, M. Schimek, D. Seemaier, C. Stelz, and T. N. Uhl. KaMPIng: Flexible and (near) zero-overhead C++ bindings for MPI. CoRR, abs/2404.05610, 2024. URL: https://doi.org/10.48550/arXiv.2404.05610.
  14. Joseph F. JáJá, Kwan Woo Ryu, and Uzi Vishkin. Sorting strings and constructing digital search trees in parallel. Theor. Comput. Sci., 154(2):225-245, 1996. URL: https://doi.org/10.1016/0304-3975(94)00263-0.
  15. Juha Kärkkäinen and Tommi Rantala. Engineering radix sort for strings. In SPIRE, volume 5280 of Lecture Notes in Computer Science, pages 3-14. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-89097-3_3.
  16. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. URL: https://doi.org/10.1145/1217856.1217858.
  17. Donald E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, 1973. Google Scholar
  18. Florian Kurpicz, Pascal Mehnert, Peter Sanders, and Matthias Schimek. Brief announcement: Scalable distributed string sorting. In SPAA, pages 375-377. ACM, 2024. URL: https://doi.org/10.1145/3626183.3660256.
  19. Florian Kurpicz, Pascal Mehnert, Peter Sanders, and Matthias Schimek. Scalable distributed string sorting. CoRR, abs/2404.16517, 2024. URL: https://doi.org/10.48550/arXiv.2404.16517.
  20. Xiaobo Li, Paul Lu, Jonathan Schaeffer, John Shillington, Pok Sze Wong, and Hanmao Shi. On the versatility of parallel sorting by regular sampling. Parallel Comput., 19(10):1079-1103, 1993. URL: https://doi.org/10.1016/0167-8191(93)90019-H.
  21. Pascal Mehnert. Scalable distributed string sorting algorithms. Master’s thesis, Karlsruher Institut für Technologie (KIT), 2024. URL: https://doi.org/10.5445/IR/1000170222.
  22. Pascal Mehnert and Matthias Schimek. mschimek/scalable-distributed-string-sorting. Software, European Research Council (ERC)(grant agreement No. 882500), swhId: https://archive.softwareheritage.org/swh:1:dir:1d60272c5beeb821650519f3f0ce805434b705fa;origin=https://github.com/mschimek/scalable-distributed-string-sorting;visit=swh:1:snp:b7d6ae83b53f38e7fa3ff170e825a48098d71a4b;anchor=swh:1:rev:ae07b962b4259b92d1037b5b6d9c671b08c6982b (visited on 2024-07-09). URL: https://github.com/mschimek/scalable-distributed-string-sorting.
  23. Bayyapu Neelima, Anjjan S. Narayan, and Rithesh G. Prabhu. String sorting on multi and many-threaded architectures: A comparative study. In ICHPCA, pages 1-6. IEEE, 2014. Google Scholar
  24. Waihong Ng and Katsuhiko Kakehi. Merging string sequences by longest common prefixes. IPSJ Digital Courier, 4:69-78, 2008. Google Scholar
  25. Ge Nong. Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst., 31(3):1-15, 2013. URL: https://doi.org/10.1145/2493175.2493180.
  26. Peter Sanders, Kurt Mehlhorn, Martin Dietzfelbinger, and Roman Dementiev. Sequential and Parallel Algorithms and Data Structures - The Basic Toolbox. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-25209-0.
  27. Peter Sanders, Sebastian Schlag, and Ingo Müller. Communication efficient algorithms for fundamental big data problems. In IEEE BigData, pages 15-23. IEEE Computer Society, 2013. URL: https://doi.org/10.1109/BIGDATA.2013.6691549.
  28. Matthias Schimek. Distributed string sorting algorithms. Master’s thesis, Karlsruher Institut für Technologie (KIT), 2019. URL: https://doi.org/10.5445/IR/1000098432.
  29. Hanmao Shi and Jonathan Schaeffer. Parallel sorting by regular sampling. J. Parallel Distributed Comput., 14(4):361-372, 1992. URL: https://doi.org/10.1016/0743-7315(92)90075-X.
  30. Ranjan Sinha and Anthony Wirth. Engineering burstsort: Towards fast in-place string sorting. In WEA, volume 5038 of Lecture Notes in Computer Science, pages 14-27. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-68552-4_2.
  31. Peter J. Varman, Scott D. Scheufler, Balakrishna R. Iyer, and Gary R. Ricard. Merging multiple lists on hierarchical-memory multiprocessors. J. Parallel Distributed Comput., 12(2):171-177, 1991. URL: https://doi.org/10.1016/0743-7315(91)90022-2.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail