Parallel Five-Cycle Counting Algorithms

Authors Louisa Ruixue Huang, Jessica Shi, Julian Shun



PDF
Thumbnail PDF

File

LIPIcs.SEA.2021.2.pdf
  • Filesize: 1.44 MB
  • 18 pages

Document Identifiers

Author Details

Louisa Ruixue Huang
  • MIT, CSAIL, Cambridge, MA, USA
Jessica Shi
  • MIT, CSAIL, Cambridge, MA, USA
Julian Shun
  • MIT, CSAIL, Cambridge, MA, USA

Cite AsGet BibTex

Louisa Ruixue Huang, Jessica Shi, and Julian Shun. Parallel Five-Cycle Counting Algorithms. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SEA.2021.2

Abstract

Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This paper focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently. We design two new parallel five-cycle counting algorithms and prove that they are work-efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our algorithms achieves 10-46x self-relative speed-up, outperform our serial benchmarks by 10-32x, and outperform the previous state-of-the-art serial algorithm by up to 818x.

Subject Classification

ACM Subject Classification
  • Theory of computation → Shared memory algorithms
  • Theory of computation → Graph algorithms analysis
  • Computing methodologies → Shared memory algorithms
Keywords
  • Cycle counting
  • parallel algorithms
  • graph algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Abboud and V. V. Williams. Popular conjectures imply strong lower bounds for dynamic problems. In Proceedings of the IEEE Symposium on Foundations of Computer Science, pages 434-443, 2014. Google Scholar
  2. Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick G. Duffield, and Theodore L. Willke. Graphlet decomposition: framework, algorithms, and applications. Knowl. Inf. Syst., 50(3):689-722, 2017. Google Scholar
  3. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997. Google Scholar
  4. Richard Anderson and Ernst W. Mayr. A 𝖯-complete problem and approximations to it. Technical report, Stanford University, 1984. Google Scholar
  5. Leonid Barenboim and Michael Elkin. Sublogarithmic distributed MIS algorithm for sparse graphs using Nash-Williams decomposition. Distributed Computing, 22:363-379, 2010. Google Scholar
  6. Vladimir Batagelj and Matjaž Zaveršnik. Fast algorithms for determining (generalized) core groups in social networks. Advances in Data Analysis and Classification, 5(2):129-145, 2011. Google Scholar
  7. Suman K. Bera, Noujan Pashanasangi, and C. Seshadhri. Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six. In Proceedings of the Innovations in Theoretical Computer Science Conference, pages 38:1-38:20, 2020. Google Scholar
  8. Jonathan W. Berry, Luke K. Fostvedt, Daniel J. Nordman, Cynthia A. Phillips, C. Seshadhri, and Alyson G. Wilson. Why do simple algorithms for triangle enumeration work in the real world? In Proceedings of the Conference on Innovations in Theoretical Computer Science, page 225–234, 2014. Google Scholar
  9. Maciej Besta, Armon Carigiet, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, and Torsten Hoefler. High-performance parallel graph coloring with strong guarantees on work, depth, and quality. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2020. Google Scholar
  10. M. A. Bhuiyan, M. Rahman, M. Rahman, and M. Al Hasan. GUISE: Uniform sampling of graphlets for large graph analysis. In Proceedings of the IEEE International Conference on Data Mining, pages 91-100, 2012. Google Scholar
  11. Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. Low depth cache-oblivious algorithms. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, page 189–199, 2010. Google Scholar
  12. Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720-748, September 1999. Google Scholar
  13. Richard P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201-206, April 1974. Google Scholar
  14. Ronald S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349-399, 2004. Google Scholar
  15. Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210-223, February 1985. Google Scholar
  16. Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. Theoretically efficient parallel graph algorithms can be fast and scalable. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, page 393–404, 2018. Google Scholar
  17. Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E. Blelloch, and Julian Shun. The graph based benchmark suite (GBBS). In Proceedings of the Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2020. Google Scholar
  18. Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, and Alexandros G. Dimakis. Beyond triangles: A distributed framework for estimating 3-profiles of large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 229–238, 2015. Google Scholar
  19. Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, and Alexandros G. Dimakis. Distributed estimation of graph 4-profiles. In Proceedings of the International Conference on World Wide Web, page 483–493, 2016. Google Scholar
  20. Giorgio Fagiolo. Clustering in complex directed networks. Physical Review E, 76(2):026107, 2007. Google Scholar
  21. Katherine Faust. A puzzle concerning triads in social networks: Graph constraints and the triad census. Social Networks, 32(3):221-233, 2010. Google Scholar
  22. J. Gil, Y. Matias, and U. Vishkin. Towards a theory of nearly constant time parallel algorithms. In Proceedings of the IEEE Symposium of Foundations of Computer Science, pages 698-710, 1991. Google Scholar
  23. Michael T. Goodrich and Paweł Pszona. External-memory network analysis algorithms for naturally sparse graphs. In Proceedings of the European Symposium on Algorithms, pages 664-676, 2011. Google Scholar
  24. Tomaz Hocevar and Janez Demsar. A combinatorial approach to graphlet counting. Bioinformatics, pages 559-65, 2014. Google Scholar
  25. Paul W. Holland and Samuel Leinhardt. A method for detecting structure in sociometric data. American Journal of Sociology, 76(3):492-513, 1970. Google Scholar
  26. Joseph Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992. Google Scholar
  27. Łukasz Kowalik. Short cycles in planar graphs. In Proceedings of the International Workshop on Graph-Theoretic Concepts in Computer Science, pages 284-296, 2003. Google Scholar
  28. Charles E. Leiserson. The Cilk++ concurrency platform. J. Supercomputing, 51(3), 2010. Google Scholar
  29. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2019.
  30. David W. Matula and Leland L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30(3), July 1983. Google Scholar
  31. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. The graph structure in the Web-analyzed on different aggregation levels. The Journal of Web Science, 1(1):33-47, 2015. Google Scholar
  32. Crispin Nash-Williams. Decomposition of finite graphs into forests. Journal of the London Mathematical Society, s1-39(1):12-12, 1964. Google Scholar
  33. Rasmus Pagh and Charalampos E. Tsourakakis. Colorful triangle counting and a MapReduce implementation. Inf. Process. Lett., 112(7), March 2012. Google Scholar
  34. Ali Pinar, C. Seshadhri, and Vaidyanathan Vishal. ESCAPE: Efficiently counting all 5-vertex subgraphs. In Proceedings of the International Conference on World Wide Web, page 1431–1440, 2017. Google Scholar
  35. Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endow., 11(12):1876–1888, 2018. Google Scholar
  36. M. Rahman, M. A. Bhuiyan, and M. Al Hasan. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering, 26(10):2466-2478, 2014. Google Scholar
  37. S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594-607, June 1989. Google Scholar
  38. Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. Butterfly counting in bipartite networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 2150–2159, 2018. Google Scholar
  39. Ahmet Erdem Sarıyüce and Ali Pinar. Peeling bipartite networks for dense subgraph discovery. In Proceedings of the ACM International Conference on Web Search and Data Mining, page 504–512, 2018. Google Scholar
  40. T. Schank. Algorithmic aspects of triangle-based network analysis. PhD Thesis, Universitat Karlsruhe, 2007. Google Scholar
  41. Jessica Shi, Laxman Dhulipala, and Julian Shun. Parallel clique counting and peeling algorithms. arXiv preprint arXiv:2002.10047, 2020. URL: http://arxiv.org/abs/2002.10047.
  42. Jessica Shi and Julian Shun. Parallel algorithms for butterfly computations. In Bruce M. Maggs, editor, Proceedings of the SIAM Symposium on Algorithmic Principles of Computer Systems, pages 16-30, 2020. Google Scholar
  43. Charalampos E. Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, and Christos Faloutsos. Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Social Network Analysis and Mining, 1(2):75-81, April 2011. Google Scholar
  44. J. Wang, A. W. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. In Proceedings of the IEEE International Congress on Big Data, pages 17-24, 2014. Google Scholar
  45. Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. Proc. VLDB Endow., 12(10):1139–1152, 2019. Google Scholar
  46. Sebastian Wernicke and Florian Rasche. FANMOD: a tool for fast network motif detection. Bioinformatics, 22(9):1152-1153, 2006. Google Scholar
  47. Xiangzhou Xia. Efficient and scalable listing of four-vertex subgraphs. Master’s thesis, Texas A&M University, 2016. Google Scholar
  48. R. Zhu, Z. Zou, and J. Li. Fast rectangle counting on massive networks. In Proceedings of the IEEE International Conference on Data Mining, pages 847-856, 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail