An Experimental Study of External Memory Algorithms for Connected Components

Authors Gerth Stølting Brodal, Rolf Fagerberg, David Hammer, Ulrich Meyer, Manuel Penschuck, Hung Tran



PDF
Thumbnail PDF

File

LIPIcs.SEA.2021.23.pdf
  • Filesize: 1.04 MB
  • 23 pages

Document Identifiers

Author Details

Gerth Stølting Brodal
  • Aarhus University, Denmark
Rolf Fagerberg
  • University of Southern Denmark, Odense, Denmark
David Hammer
  • Goethe Universität Frankfurt, Germany
  • University of Southern Denmark, Odense, Denmark
Ulrich Meyer
  • Goethe Universität Frankfurt, Germany
Manuel Penschuck
  • Goethe Universität Frankfurt, Germany
Hung Tran
  • Goethe Universität Frankfurt, Germany

Acknowledgements

Extensive calculations on the Goethe-HLR high-performance computer of the Goethe University Frankfurt were conducted for this research. The authors would like to acknowledge the CSC team for their support. We would also like to thank Peter Sanders for valuable feedback on an earlier draft of this paper.

Cite AsGet BibTex

Gerth Stølting Brodal, Rolf Fagerberg, David Hammer, Ulrich Meyer, Manuel Penschuck, and Hung Tran. An Experimental Study of External Memory Algorithms for Connected Components. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 23:1-23:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SEA.2021.23

Abstract

We empirically investigate algorithms for solving Connected Components in the external memory model. In particular, we study whether the randomized O(Sort(E)) algorithm by Karger, Klein, and Tarjan can be implemented to compete with practically promising and simpler algorithms having only slightly worse theoretical cost, namely Borůvka’s algorithm and the algorithm by Sibeyn and collaborators. For all algorithms, we develop and test a number of tuning options. Our experiments are executed on a large set of different graph classes including random graphs, grids, geometric graphs, and hyperbolic graphs. Among our findings are: The Sibeyn algorithm is a very strong contender due to its simplicity and due to an added degree of freedom in its internal workings when used in the Connected Components setting. With the right tunings, the Karger-Klein-Tarjan algorithm can be implemented to be competitive in many cases. Higher graph density seems to benefit Karger-Klein-Tarjan relative to Sibeyn. Borůvka’s algorithm is not competitive with the two others.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Paths and connectivity problems
  • Theory of computation → Graph algorithms analysis
Keywords
  • Connected Components
  • Experimental Evaluation
  • External Memory
  • Graph Algorithms
  • Randomization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116-1127, 1988. URL: https://doi.org/10.1145/48529.48535.
  2. Susanne Albers, Andreas Crauser, and Kurt Mehlhorn. Lecture notes on algorithms for very large data sets. https://web.archive.org/web/19970816002522/http://www.mpi-sb.mpg.de/~crauser/Plan.ps.gz, 1997.
  3. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: https://doi.org/10.1006/jcss.1997.1545.
  4. Lars Arge, Gerth Stølting Brodal, and Laura Toma. On external-memory MST, SSSP and multi-way planar graph separation. J. Algorithms, 53(2):186-206, 2004. URL: https://doi.org/10.1016/j.jalgor.2004.04.001.
  5. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In José D. P. Rolim and Salil P. Vadhan, editors, Randomization and Approximation Techniques, 6th International Workshop, RANDOM 2002, Cambridge, MA, USA, September 13-15, 2002, Proceedings, volume 2483 of Lecture Notes in Computer Science, pages 1-10. Springer Berling Heidelberg, 2002. URL: https://doi.org/10.1007/3-540-45726-7_1.
  6. Andreas Beckmann, Roman Dementiev, and Johannes Singler. Building a parallel pipelined external memory algorithm library. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23-29, 2009, pages 1-10. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/IPDPS.2009.5161001.
  7. Alka Bhushan and Sajith Gopalan. An I/O efficient algorithm for minimum spanning trees. In Zaixin Lu, Donghyun Kim, Weili Wu, Wei Li, and Ding-Zhu Du, editors, Combinatorial Optimization and Applications - 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, volume 9486 of Lecture Notes in Computer Science, pages 499-509. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-26626-8_36.
  8. Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey S. Vitter. External-memory graph algorithms. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 22-24 January 1995. San Francisco, California, USA, pages 139-149. SIAM, Philadelphia, PA, USA, 1995. URL: http://dl.acm.org/doi/10.5555/313651.313681.
  9. Roman Dementiev, Lutz Kettner, and Peter Sanders. STXXL: standard template library for XXL data sets. Softw. Pract. Exp., 38(6):589-637, 2008. URL: https://doi.org/10.1002/spe.844.
  10. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, 1985. URL: https://doi.org/10.1016/0022-0000(85)90041-8.
  11. Daniel Funke, Sebastian Lamm, Ulrich Meyer, Manuel Penschuck, Peter Sanders, Christian Schulz, Darren Strash, and Moritz von Looz. Communication-free massively distributed graph generation. J. Parallel Distributed Comput., 131:200-217, 2019. URL: https://doi.org/10.1016/j.jpdc.2019.03.011.
  12. Edgar N. Gilbert. Random graphs. Ann. Math. Statist., 30(4):1141-1144, December 1959. URL: https://doi.org/10.1214/aoms/1177706098.
  13. Edgar N. Gilbert. Random plane networks. Journal of the Society for Industrial and Applied Mathematics, 9(4):533-543, 1961. URL: https://doi.org/10.1137/0109045.
  14. Luca Gugelmann, Konstantinos Panagiotou, and Ueli Peter. Random hyperbolic graphs: Degree sequence and clustering. In Proceedings of the 39th International Colloquium Conference on Automata, Languages, and Programming - Volume Part II, ICALP 2012, Warwick, UK, July 9-13, 2012, page 573–585. Springer Berlin Heidelberg, 2012. URL: https://doi.org/10.1007/978-3-642-31585-5_51.
  15. David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-time algorithm to find minimum spanning trees. J. ACM, 42(2):321-328, March 1995. URL: https://doi.org/10.1145/201019.201022.
  16. Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguñá. Hyperbolic geometry of complex networks. Phys. Rev. E, 82:036106, September 2010. URL: https://doi.org/10.1103/PhysRevE.82.036106.
  17. Anil Maheshwari and Norbert Zeh. A survey of techniques for designing I/O-efficient algorithms. In Ulrich Meyer, Peter Sanders, and Jop F. Sibeyn, editors, Algorithms for Memory Hierarchies, Advanced Lectures [Dagstuhl Research Seminar, March 10-14, 2002], volume 2625 of Lecture Notes in Computer Science, pages 36-61. Springer, 2002. URL: https://doi.org/10.1007/3-540-36574-5_3.
  18. Kamesh Munagala and Abhiram G. Ranade. I/O-complexity of graph algorithms. In Robert Endre Tarjan and Tandy J. Warnow, editors, Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, 17-19 January 1999, Baltimore, Maryland, USA, pages 687-694. ACM/SIAM, 1999. URL: https://dl.acm.org/doi/10.5555/314500.314891.
  19. Mathew D. Penrose. Random Geometric Graphs. Oxford University Press, 2003. URL: https://doi.org/10.1093/acprof:oso/9780198506263.001.0001.
  20. Manuel Penschuck. Generating practical random hyperbolic graphs in near-linear time and with sub-linear memory. In Costas S. Iliopoulos, Solon P. Pissis, Simon J. Puglisi, and Rajeev Raman, editors, 16th International Symposium on Experimental Algorithms, SEA 2017, June 21-23, 2017, London, UK, volume 75 of LIPIcs, pages 26:1-26:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.SEA.2017.26.
  21. Manuel Penschuck, Ulrik Brandes, Michael Hamann, Sebastian Lamm, Ulrich Meyer, Ilya Safro, Peter Sanders, and Christian Schulz. Recent advances in scalable network generation. CoRR, abs/2003.00736, 2020. URL: http://arxiv.org/abs/2003.00736.
  22. Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16-34, 2002. URL: https://doi.org/10.1145/505241.505243.
  23. Peter Sanders. Fast priority queues for cached memory. ACM J. Exp. Algorithmics, 5:7, 2000. URL: https://doi.org/10.1145/351827.384249.
  24. Dominik Schultes. External Memory Minimum Spanning Trees. Bachelor thesis, Universität des Saarlandes, 2003. URL: http://algo2.iti.kit.edu/schultes/emmst/emmst_short.pdf.
  25. Dominik Schultes. External memory spanning forests and connected components, 2003. URL: http://algo2.iti.kit.edu/dementiev/files/cc.pdf.
  26. Jop Sibeyn, Roman Dementiev, Peter Sanders, and Dominik Schultes. Engineering an external memory minimum spanning tree algorithm. In Jean-Jacques Levy, John C. Mitchell, and Ernst W. Mayr, editors, Exploring New Frontiers of Theoretical Informatics, volume 155, pages 195-208. Kluwer Academic Publishers, Boston, 2004. URL: https://doi.org/10.1007/1-4020-8141-3_17.
  27. Jop F. Sibeyn. External connected components. In Torben Hagerup and Jyrki Katajainen, editors, Algorithm Theory - SWAT 2004, 9th Scandinavian Workshop on Algorithm Theory, Humlebæk, Denmark, July 8-10, 2004, volume 3111, pages 468-479. Springer Berlin Heidelberg, 2004. URL: https://doi.org/10.1007/978-3-540-27810-8_40.
  28. Robert E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215-225, 1975. URL: https://doi.org/10.1145/321879.321884.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail