Buffered Streaming Edge Partitioning

Authors Adil Chhabra , Marcelo Fonseca Faraj , Christian Schulz , Daniel Seemaier



PDF
Thumbnail PDF

File

LIPIcs.SEA.2024.5.pdf
  • Filesize: 1.21 MB
  • 21 pages

Document Identifiers

Author Details

Adil Chhabra
  • Heidelberg University, Germany
Marcelo Fonseca Faraj
  • Heidelberg University, Germany
Christian Schulz
  • Heidelberg University, Germany
Daniel Seemaier
  • Karlsruhe Institute of Technology, Germany

Acknowledgements

We acknowledge support by DFG grant SCHU 2567/5-1. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 882500). Moreover, we would like to acknowledge Dagstuhl Seminar 23331 on Recent Trends in Graph Decomposition [Karypis et al., 2024].

Cite AsGet BibTex

Adil Chhabra, Marcelo Fonseca Faraj, Christian Schulz, and Daniel Seemaier. Buffered Streaming Edge Partitioning. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 5:1-5:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.5

Abstract

Addressing the challenges of processing massive graphs, which are prevalent in diverse fields such as social, biological, and technical networks, we introduce HeiStreamE and FreightE, two innovative (buffered) streaming algorithms designed for efficient edge partitioning of large-scale graphs. HeiStreamE utilizes an adapted Split-and-Connect graph model and a Fennel-based multilevel partitioning scheme, while FreightE partitions a hypergraph representation of the input graph. Besides ensuring superior solution quality, these approaches also overcome the limitations of existing algorithms by maintaining linear dependency on the graph size in both time and memory complexity with no dependence on the number of blocks of partition. Our comprehensive experimental analysis demonstrates that HeiStreamE outperforms current streaming algorithms and the re-streaming algorithm 2PS in partitioning quality (replication factor), and is more memory-efficient for real-world networks where the number of edges is far greater than the number of vertices. Further, FreightE is shown to produce fast and efficient partitions, particularly for higher numbers of partition blocks.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Graph algorithms analysis
Keywords
  • graph partitioning
  • edge partitioning
  • streaming
  • online
  • buffered partitioning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dan Alistarh, Jennifer Iglesias, and Milan Vojnovic. Streaming min-max hypergraph partitioning. In Advances in Neural Information Processing Systems, pages 1900-1908, 2015. URL: https://doi.org/10.5555/2969442.2969452.
  2. C.J. Alpert, Jen-Hsin Huang, and Andrew Kahng. Multilevel circuit partitioning. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 17:655-667, September 1998. URL: https://doi.org/10.1109/43.712098.
  3. Amel Awadelkarim and Johan Ugander. Prioritized restreaming algorithms for balanced graph partitioning. In Proc. of the 26th ACM SIGKDD Intl. Conf. on Knowledge Discovery & Data Mining, pages 1877-1887, 2020. URL: https://doi.org/10.1145/3394486.3403239.
  4. David A. Bader, Henning Meyerhenke, Peter Sanders, Christian Schulz, Andrea Kappes, and Dorothea Wagner. Benchmarking for graph clustering and partitioning. In Encyclopedia of Social Network Analysis and Mining, pages 73-82. Springer New York, 2014. URL: https://doi.org/10.1007/978-1-4614-6170-8_23.
  5. Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. BUbiNG: Massive crawling for the masses. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web, pages 227-228. International World Wide Web Conferences Steering Committee, 2014. Google Scholar
  6. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar, editors, Proceedings of the 20th international conference on World Wide Web, pages 587-596. ACM Press, 2011. Google Scholar
  7. Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595-601, Manhattan, USA, 2004. ACM Press. Google Scholar
  8. Florian Bourse, Marc Lelarge, and Milan Vojnovic. Balanced Graph Edge Partition. In Proc. of 20th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, KDD '14, pages 1456-1465. ACM, 2014. URL: https://doi.org/10.1145/2623330.2623660.
  9. Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE transactions on knowledge and data engineering, 20(2):172-188, 2007. URL: https://doi.org/10.1109/TKDE.2007.190689.
  10. Thang Nguyen Bui and Curt Jones. Finding Good Approximate Vertex and Edge Partitions is NP-Hard. Information Processing Letters, 42(3):153-159, 1992. URL: https://doi.org/10.1016/0020-0190(92)90140-Q.
  11. Aydın Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. Recent Advances in Graph Partitioning, pages 117-158. Springer Intl. Publishing, Cham, 2016. URL: https://doi.org/10.1007/978-3-319-49487-6_4.
  12. Ü. V. Çatalyürek, M. Deveci, K. Kaya, and B. Uçar. UMPa: A Multi-objective, Multi-level Partitioner for Communication Minimization. In 10th DIMACS Impl. Challenge Workshop: Graph Partitioning and Graph Clustering. Georgia Institute of Technology, Atlanta, GA, February 13-14 2012. Google Scholar
  13. Ümit V. Çatalyürek and Cevdet Aykanat. Patoh (partitioning tool for hypergraphs). In Encyclopedia of Parallel Computing, pages 1479-1487. Springer, 2011. URL: https://doi.org/10.1007/978-0-387-09766-4_93.
  14. Ümit V. Çatalyürek, Karen D. Devine, Marcelo Fonseca Faraj, Lars Gottesbüren, Tobias Heuer, Henning Meyerhenke, Peter Sanders, Sebastian Schlag, Christian Schulz, Daniel Seemaier, and Dorothea Wagner. More recent advances in (hyper)graph partitioning. ACM Computing Surveys, 55:1-38, 2023. URL: https://doi.org/doi.org/10.1145/3571808.
  15. K.D. Devine, E.G. Boman, R.T. Heaphy, R.G. Bisseling, and Ümit Çatalyürek. Parallel hypergraph partitioning for scientific computing. International Conference on Parallel and Distributed Processing (IPDPS), 20:124-124, 2006. Google Scholar
  16. Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2):201-213, January 2002. URL: https://doi.org/10.1007/s101070100263.
  17. Kamal Eyubov, Marcelo Fonseca Faraj, and Christian Schulz. FREIGHT: Fast Streaming Hypergraph Partitioning. In Loukas Georgiadis, editor, Intl. Sym. on Experimental Algorithms (SEA), volume 265 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1-15:16, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.SEA.2023.15.
  18. Marcelo Fonseca Faraj and Christian Schulz. Buffered streaming graph partitioning. ACM J. Exp. Algorithmics, 27:1.10:1-1.10:26, 2022. URL: https://doi.org/10.1145/3546911.
  19. Marcelo Fonseca Faraj and Christian Schulz. Recursive multi-section on the fly: Shared-memory streaming algorithms for hierarchical graph partitioning and process mapping. In 2022 IEEE Intl. Conf. on Cluster Computing (CLUSTER), pages 473-483, 2022. URL: https://doi.org/10.1109/CLUSTER51413.2022.00057.
  20. Daniel Funke, Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, and Moritz von Looz. Communication-free massively distributed graph generation. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 336-347, 2018. URL: https://doi.org/10.1109/IPDPS.2018.00043.
  21. Michael R. Garey, David S. Johnson, and Larry Stockmeyer. Some Simplified NP-Complete Problems. In Proc. of the 6th ACM Sym. on Theory of Computing, (STOC), pages 47-63. ACM, 1974. URL: https://doi.org/10.1145/800119.803884.
  22. Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Sym. on Operating Systems Design and Implementation (OSDI 12), pages 17-30, 2012. URL: https://doi.org/10.5555/2387880.2387883.
  23. Lars Gottesbüren, Tobias Heuer, Peter Sanders, and Sebastian Schlag. Scalable Shared-Memory Hypergraph Partitioning. In Proc. of the Sym. on Algorithm Engineering and Experiments ALENEX, pages 16-30, 2021. URL: https://doi.org/10.1137/1.9781611976472.2.
  24. Loc Hoang, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. Cusp: A customizable streaming edge partitioner for distributed graph analytics. In 2019 IEEE Intl. Parallel and Distributed Processing Sym. (IPDPS), pages 439-450. IEEE, 2019. URL: https://doi.org/10.1109/IPDPS.2019.00054.
  25. Nazanin Jafari, Oguz Selvitopi, and Cevdet Aykanat. Fast shared-memory streaming multilevel graph partitioning. Journal of Parallel and Distributed Computing, 147:140-151, 2021. URL: https://doi.org/10.1016/j.jpdc.2020.09.004.
  26. Nilesh Jain, Guangdeng Liao, and Theodore L. Willke. Graphbuilder: Scalable graph ETL framework. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2484425.2484429.
  27. Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita, Yaroslav Akhremtsev, and Alessandro Presta. Social hash partitioner: A scalable distributed hypergraph partitioner. Proc. VLDB Endow., 10(11):1418-1429, 2017. URL: https://doi.org/10.14778/3137628.3137650.
  28. George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359-392, 1998. URL: https://doi.org/10.1137/S1064827595287997.
  29. George Karypis and Vipin Kumar. Multilevel k-way hypergraph partitioning. In Proceedings of the 36th Conference on Design Automation, pages 343-348. ACM Press, 1999. URL: https://doi.org/10.1145/309847.309954.
  30. George Karypis, Christian Schulz, Darren Strash, Deepak Ajwani, Rob H. Bisseling, Katrin Casel, Ümit V. Çatalyürek, Cédric Chevalier, Florian Chudigiewitsch, Marcelo Fonseca Faraj, Michael Fellows, Lars Gottesbüren, Tobias Heuer, Kamer Kaya, Jakub Lacki, Johannes Langguth, Xiaoye Sherry Li, Ruben Mayer, Johannes Meintrup, Yosuke Mizutani, François Pellegrini, Fabrizio Petrini, Frances Rosamond, Ilya Safro, Sebastian Schlag, Roohani Sharma, Blair D. Sullivan, Bora Uçar, and Albert-Jan Yzelman. Recent Trends in Graph Decomposition (Dagstuhl Seminar 23331). Dagstuhl Reports, 13(8):1-45, 2024. URL: https://doi.org/10.4230/DagRep.13.8.1.
  31. Jure Leskovec. Stanford Network Analysis Package (SNAP), 2013. Google Scholar
  32. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29-123, 2009. URL: https://doi.org/10.1080/15427951.2009.10129177.
  33. Lingda Li, Robel Geda, Ari B. Hayes, Yan-Hao Chen, Pranav Chaudhari, Eddy Z. Zhang, and Mario Szegedy. A simple yet effective balanced edge partition model for parallel computing. Proc. ACM Meas. Anal. Comput. Syst., 1(1):14:1-14:21, 2017. URL: https://doi.org/10.1145/3084451.
  34. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716-727, April 2012. URL: https://doi.org/10.14778/2212351.2212354.
  35. Christian Mayer, Ruben Mayer, Muhammad Adnan Tariq, Heiko Geppert, Larissa Laich, Lukas Rieger, and Kurt Rothermel. ADWISE: Adaptive window-based streaming edge partitioning for high-speed graph processing. In 2018 IEEE 38th Intl. Conf. on Distributed Computing Systems (ICDCS), pages 685-695. IEEE, 2018. URL: https://doi.org/10.1109/ICDCS.2018.00072.
  36. Ruben Mayer, Kamil Orujzade, and Hans-Arno Jacobsen. 2ps: High-quality edge partitioning with two-phase streaming. CoRR, abs/2001.07086, 2020. URL: https://arxiv.org/abs/2001.07086.
  37. Ruben Mayer, Kamil Orujzade, and Hans-Arno Jacobsen. Out-of-core edge partitioning at linear run-time. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, pages 2629-2642. IEEE, 2022. URL: https://doi.org/10.1109/ICDE53745.2022.00242.
  38. Henning Meyerhenke, Peter Sanders, and Christian Schulz. Partitioning complex networks via size-constrained clustering. In Experimental Algorithms - 13th International Symposium, SEA, volume 8504 of LNCS, pages 351-363. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-07959-2_30.
  39. Joel Nishimura and Johan Ugander. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proc. of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1106-1114, 2013. URL: https://doi.org/10.1145/2487575.2487696.
  40. François Pellegrini and Jean Roman. Experimental analysis of the dual recursive bipartitioning algorithm for static mapping. Technical report, TR 1038-96, LaBRI, URA CNRS 1304, Univ. Bordeaux I, 1996. Google Scholar
  41. Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. Hdrf: Stream-based partitioning for power-law graphs. In Proc. of the 24th ACM Intl. on Conf. on Information and Knowledge Management, pages 243-252, 2015. URL: https://doi.org/10.1145/2806416.2806424.
  42. Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. http://networkrepository.com, 2015.
  43. Hooman Peiro Sajjad, Amir H Payberah, Fatemeh Rahimian, Vladimir Vlassov, and Seif Haridi. Boosting vertex-cut partitioning for streaming graphs. In 2016 IEEE Intl. Congress on Big Data (BigData Congress), pages 1-8. IEEE, 2016. URL: https://doi.org/10.1109/BigDataCongress.2016.10.
  44. Peter Sanders and Christian Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In 12th Intl. Sym. on Experimental Algorithms (SEA), LNCS. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-38527-8_16.
  45. Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke, Peter Sanders, and Christian Schulz. k-way hypergraph partitioning via n-level recursive bisection. In Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments, ALENEX, pages 53-67. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974317.5.
  46. Sebastian Schlag, Christian Schulz, Daniel Seemaier, and Darren Strash. Scalable edge partitioning. In Proc. of the 21st Workshop on Algorithm Engineering and Experiments, ALENEX 2019, San Diego, CA, USA, January 7-8, 2019, pages 211-225. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975499.17.
  47. Christian Schulz and Darren Strash. Graph partitioning: Formulations and applications to big data. In Encyclopedia of Big Data Technologies. Springer, 2019. URL: https://doi.org/10.1007/978-3-319-63962-8_312-2.
  48. Isabelle Stanton and Gabriel Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222-1230, 2012. URL: https://doi.org/10.1145/2339530.2339722.
  49. Monireh Taimouri and Hamid Saadatfar. Rbsep: a reassignment and buffer based streaming edge partitioning approach. Journal of Big Data, 6(1):92, October 2019. URL: https://doi.org/10.1186/s40537-019-0257-5.
  50. Fatih Taşyaran, Berkay Demireller, Kamer Kaya, and Bora Uçar. Streaming Hypergraph Partitioning Algorithms on Limited Memory Environments. In HPCS 2020 - Intl. Conf. on High Performance Computing & Simulation, pages 1-8. IEEE, 2021. URL: https://hal.archives-ouvertes.fr/hal-03182122.
  51. Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proc. of the 7th ACM international conference on Web search and data mining, pages 333-342, 2014. URL: https://doi.org/10.1145/2556195.2556213.
  52. Brendan Vastenhouw and Rob Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Review, 47, June 2002. URL: https://doi.org/10.1137/S0036144502409019.
  53. Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems, pages 1673-1681, 2014. URL: https://proceedings.neurips.cc/paper/2014/hash/67d16d00201083a2b118dd5128dd6f59-Abstract.html.
  54. Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, and Zhenguo Li. Graph edge partitioning via neighborhood heuristic. In Proc. of the 23rd ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 605-614. ACM, 2017. URL: https://doi.org/10.1145/3097983.3098033.