Streaming Matching and Edge Cover in Practice

Authors S M Ferdous , Alex Pothen , Mahantesh Halappanavar



PDF
Thumbnail PDF

File

LIPIcs.SEA.2024.12.pdf
  • Filesize: 1.04 MB
  • 22 pages

Document Identifiers

Author Details

S M Ferdous
  • Pacific Northwest National Laboratory, Richland, WA, USA
Alex Pothen
  • Purdue University, West Lafayette, IN, USA
Mahantesh Halappanavar
  • Pacific Northwest National Laboratory, Richland, WA, USA

Cite AsGet BibTex

S M Ferdous, Alex Pothen, and Mahantesh Halappanavar. Streaming Matching and Edge Cover in Practice. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 12:1-12:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.12

Abstract

Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a 1/(2+ε)-approximation of the weight while using O(n log W /ε) memory (here n is the number of vertices and W is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best 1/2-approximate offline algorithm while requiring less time and an order-of-magnitude less memory. For minimum weighted edge cover, we develop three novel semi-streaming algorithms. Two of these algorithms require a single pass through the input graph, require O(n log n) memory, and provide a 2-approximation guarantee on the objective. We also leverage a relationship between approximate maximum weighted matching and approximate minimum weighted edge cover to develop a two-pass 3/2+ε-approximate algorithm with the memory requirement of Paz and Schwartzman’s semi-streaming matching algorithm. These streaming approaches are compared against the state-of-the-art 3/2-approximate offline algorithm. The semi-streaming matching and the novel edge cover algorithms proposed in this paper can process graphs with several billions of edges in under 30 minutes using 6 GB of memory, which is at least an order of magnitude improvement from the offline (non-streaming) algorithms. For the largest graph, the best alternative offline parallel approximation algorithm (GPA+ROMA) could not finish in three hours even while employing hundreds of processors and 1 TB of memory. We also demonstrate an application of semi-streaming algorithm by computing a matching using linearly bounded memory on intersection graphs derived from three machine learning datasets, while the existing offline algorithms could not complete on one of these datasets since its memory requirement exceeded 1TB.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Theory and algorithms for application domains
  • Computing methodologies
  • Computing methodologies → Shared memory algorithms
Keywords
  • Matching
  • Edge Cover
  • Semi-Streaming Algorithm
  • Parallel Algorithms
  • Algorithm Engineering

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eugenio Angriman, Michal Boron, and Henning Meyerhenke. A batch-dynamic Suitor algorithm for approximating maximum weighted matching. ACM J. Exp. Algorithmics, 27:1.6:1-1.6:41, 2022. URL: https://doi.org/10.1145/3529228.
  2. Eugenio Angriman, Henning Meyerhenke, Christian Schulz, and Bora Uçar. Fully-dynamic weighted matching approximation in practice. In Proceedings of the SIAM Conference on Applied and Computational Discrete Algorithms(ACDA), pages 32-44. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976830.4.
  3. David Avis. A survey of heuristics for the weighted matching problem. Networks, 13(4):475-493, 1983. URL: https://doi.org/10.1002/net.3230130404.
  4. Michael Barlow, Christian Konrad, and Charana Nandasena. Streaming set cover in practice. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX), pages 181-192. SIAM, 2021. Google Scholar
  5. Andre Berge. A parallel version of the random order augmentation matching algorithm. Master’s thesis, University of Bergen, 2020. Google Scholar
  6. Michael S. Crouch and Daniel M. Stubbs. Improved streaming algorithms for weighted matching, via unweighted matching. In Proceedings of Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), volume 28 of LIPIcs, pages 96-104. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.96.
  7. Timothy A. Davis and Yifan Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1-1:25, 2011. URL: https://doi.org/10.1145/2049662.2049663.
  8. Doratha E. Drake and Stefan Hougardy. A simple approximation algorithm for the weighted matching problem. Inf. Process. Lett., 85(4):211-213, 2003. URL: https://doi.org/10.1016/S0020-0190(02)00393-9.
  9. Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. J. ACM, 61(1):1:1-1:23, 2014. URL: https://doi.org/10.1145/2529989.
  10. David Ediger, Robert McColl, E. Jason Riedy, and David A. Bader. STINGER: High performance data structure for streaming graphs. In Proceedings of IEEE Conference on High Performance Extreme Computing (HPEC), pages 1-5. IEEE, 2012. URL: https://doi.org/10.1109/HPEC.2012.6408680.
  11. Yuval Emek and Adi Rosén. Semi-streaming set cover. ACM Trans. Algorithms, 13(1):6:1-6:22, 2016. URL: https://doi.org/10.1145/2957322.
  12. Leah Epstein, Asaf Levin, Julián Mestre, and Danny Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM J. Discret. Math., 25(3):1251-1265, 2011. URL: https://doi.org/10.1137/100801901.
  13. Kamal Eyubov, Marcelo Fonseca Faraj, and Christian Schulz. FREIGHT: Fast streaming hypergraph partitioning. In Proceedings of the 21st International Symposium on Experimental Algorithms (SEA), volume 265 of LIPIcs, pages 15:1-15:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.SEA.2023.15.
  14. Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2-3):207-216, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.013.
  15. SM Ferdous. smferdous1/GraST. Software (visited on 13/05/2024). URL: https://github.com/smferdous1/GraST.
  16. SM Ferdous, Arif Khan, and Alex Pothen. Parallel algorithms through approximation: b-edge cover. In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 22-33. IEEE, 2018. Google Scholar
  17. SM Ferdous, Alex Pothen, and Arif Khan. New approximation algorithms for minimum weighted edge cover. In Proceedings of the Eighth SIAM Workshop on Combinatorial Scientific Computing (CSC), pages 97-108. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975215.10.
  18. Buddhima Gamlath, Sagar Kale, Slobodan Mitrovic, and Ola Svensson. Weighted matchings via unweighted augmentations. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), pages 491-500. ACM, 2019. URL: https://doi.org/10.1145/3293611.3331603.
  19. Mohsen Ghaffari and David Wajc. Simplified and space-optimal semi-streaming (2+ε)-approximate matching. In Proceedings of the 2nd Symposium on Simplicity in Algorithms (SOSA), volume 69 of OASIcs, pages 13:1-13:8. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/OASIcs.SOSA.2019.13.
  20. Kathrin Hanauer, Monika Henzinger, Stefan Schmid, and Jonathan Trummer. Fast and heavy disjoint weighted matchings for demand-aware datacenter topologies. In Proceedings of IEEE Conference on Computer Communications (INFOCOM), pages 1649-1658. IEEE, 2022. URL: https://doi.org/10.1109/INFOCOM48880.2022.9796921.
  21. Kathrin Hanauer, Monika Henzinger, and Christian Schulz. Recent advances in fully dynamic graph algorithms - A quick reference guide. ACM J. Exp. Algorithmics, 27:1.11:1-1.11:45, 2022. URL: https://doi.org/10.1145/3555806.
  22. Monika Henzinger, Shahbaz Khan, Richard Paul, and Christian Schulz. Dynamic matching algorithms in practice. In Proceedings of the 28th Annual European Symposium on Algorithms (ESA), volume 173 of LIPIcs, pages 58:1-58:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.58.
  23. Monika Rauch Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. Computing on data streams. In James M. Abello and Jeffrey Scott Vitter, editors, External Memory Algorithms, Proceedings of a DIMACS Workshop, New Brunswick, New Jersey, USA, May 20-22, 1998, volume 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 107-118. DIMACS/AMS, 1998. URL: https://doi.org/10.1090/dimacs/050/05.
  24. Dawei Huang and Seth Pettie. Approximate generalized matching: f-matchings and f-edge covers. Algorithmica, 84(7):1952-1992, 2022. URL: https://doi.org/10.1007/s00453-022-00949-5.
  25. Tony Jebara, Jun Wang, and Shih-Fu Chang. Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 441-448. ACM, 2009. URL: https://doi.org/10.1145/1553374.1553432.
  26. Arif Khan, Krzysztof Choromanski, Alex Pothen, S. M. Ferdous, Mahantesh Halappanavar, and Antonino Tumeo. Adaptive anonymization of data using b-edge cover. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), pages 59:1-59:11. IEEE / ACM, 2018. URL: http://dl.acm.org/citation.cfm?id=3291735.
  27. Arif Khan and Alex Pothen. A new 3/2-approximation algorithm for the b-edge cover problem. In Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing (CSC), pages 52-61. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974690.CH6.
  28. Arif Khan, Alex Pothen, Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Fredrik Manne, Mahantesh Halappanavar, and Pradeep Dubey. Efficient approximation algorithms for weighted b-matching. SIAM J. Sci. Comput., 38(5):S593-S619, 2016. URL: https://doi.org/10.1137/15M1026304.
  29. Matthieu Latapy, Tiphaine Viard, and Clémence Magnien. Stream graphs and link streams for the modeling of interactions over time. Soc. Netw. Anal. Min., 8(1):61:1-61:29, 2018. URL: https://doi.org/10.1007/s13278-018-0537-7.
  30. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van, Sören Auer, et al. DBpedia-A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic web, 6(2):167-195, 2015. Google Scholar
  31. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142-150. The Association for Computer Linguistics, 2011. URL: https://aclanthology.org/P11-1015/.
  32. Jens Maue and Peter Sanders. Engineering algorithms for approximate weighted matching. In Proceedings of the 6th International Workshop of Experimental Algorithms (WEA), volume 4525, page 242. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-72845-0_19.
  33. Andrew McGregor. Finding graph matchings in data streams. In Proceedings of the 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), volume 3624 of LNCS, pages 170-181. Springer, 2005. URL: https://doi.org/10.1007/11538462_15.
  34. Andrew McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9-20, 2014. URL: https://doi.org/10.1145/2627692.2627694.
  35. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2):117-236, 2005. URL: https://doi.org/10.1561/0400000002.
  36. Ami Paz and Gregory Schwartzman. A (2+ε)-approximation for maximum weight matching in the semi-streaming model. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2153-2161. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974782.140.
  37. Seth Pettie and Peter Sanders. A simpler linear time 2/3-ε approximation for maximum weight matching. Inf. Process. Lett., 91(6):271-276, 2004. URL: https://doi.org/10.1016/J.IPL.2004.05.007.
  38. Yoann Pigné, Antoine Dutot, Frédéric Guinand, and Damien Olivier. Graphstream: A tool for bridging the gap between complex systems and dynamic graphs. CoRR, abs/0803.2093, 2008. URL: https://arxiv.org/abs/0803.2093.
  39. Alex Pothen, S. M. Ferdous, and Fredrik Manne. Approximation algorithms in combinatorial scientific computing. Acta Numer., 28:541-633, 2019. URL: https://doi.org/10.1017/S0962492919000035.
  40. Robert Preis. Linear time 1/2-approximation algorithm for maximum weighted matching in general graphs. In Proocedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science (STAC), volume 1563 of Lecture Notes in Computer Science, pages 259-269. Springer, 1999. URL: https://doi.org/10.1007/3-540-49116-3_24.
  41. David Tench, Evan West, Victor Zhang, Michael A Bender, Abiyaz Chowdhury, J Ahmed Dellas, Martin Farach-Colton, Tyler Seip, and Kenny Zhang. GraphZeppelin: Storage-friendly sketching for connected components on dynamic graph streams. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 325-339. ACM, 2022. URL: https://doi.org/10.1145/3514221.3526146.
  42. Mariano Zelke. Weighted matching in the semi-streaming model. Algorithmica, 62(1-2):1-20, 2012. URL: https://doi.org/10.1007/s00453-010-9438-5.
  43. Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Proceedings of Annual Conference on Neural Information Processing Systems, pages 649-657, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html.