Counting Simplices in Hypergraph Streams

Authors Amit Chakrabarti , Themistoklis Haris



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.32.pdf
  • Filesize: 0.83 MB
  • 19 pages

Document Identifiers

Author Details

Amit Chakrabarti
  • Dartmouth College, Hanover, NH, USA
Themistoklis Haris
  • Dartmouth College, Hanover, NH, USA

Cite AsGet BibTex

Amit Chakrabarti and Themistoklis Haris. Counting Simplices in Hypergraph Streams. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 32:1-32:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.32

Abstract

We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a k-uniform hypergraph H with n vertices and m hyperedges, each hyperedge being a k-sized subset of vertices. A k-simplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of k-simplices in H. We design a suite of algorithms for this problem. As with triangle-counting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{-2} log δ^{-1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1-δ. We also give a simpler 1-pass algorithm that achieves O(ε^{-2} log δ^{-1} log n⋅ (m/T) (Δ_E + Δ_V^{1-1/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of k-simplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{-2}), Ω(m^{1+1/k}/T), Ω(m/T^{1-1/k}) and Ω(mΔ_V^{1/k}/T) for multi-pass algorithms and Ω(mΔ_E/T) for 1-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • data streaming
  • graph algorithms
  • hypergraphs
  • sub-linear algorithms
  • triangle counting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997. Preliminary version in Proc. 2nd Annual European Symposium on Algorithms , pages 354-364, 1994. Google Scholar
  2. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In Proc. 10th Conference on Innovations in Theoretical Computer Science, volume 124 of LIPIcs, pages 6:1-6:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  3. Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 623-632, 2002. Google Scholar
  4. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proc. 14th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 16-24, 2008. Google Scholar
  5. Suman K. Bera and Amit Chakrabarti. Towards tighter space bounds for counting triangles and other substructures in graph streams. In Proc. 34th International Symposium on Theoretical Aspects of Computer Science, pages 11:1-11:14, 2017. Google Scholar
  6. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. How hard is counting triangles in the streaming model? In Proc. 40th International Colloquium on Automata, Languages and Programming, pages 244-254. Springer, 2013. Google Scholar
  7. Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. Triangle counting in dynamic graph streams. Algorithmica, 76(1):259-278, 2016. URL: https://doi.org/10.1007/s00453-015-0036-4.
  8. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. Counting triangles in data streams. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 253-262, 2006. Google Scholar
  9. Amit Chakrabarti and Themistoklis Haris. Counting simplices in hypergraph streams. arXiv preprint, 2021. URL: http://arxiv.org/abs/2112.11016.
  10. Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210-223, 1985. Google Scholar
  11. Nicholas A Christakis and James H Fowler. Social network sensors for early detection of contagious outbreaks. PLoS ONE, 5(9):e12948, 2010. Google Scholar
  12. Graham Cormode and Hossein Jowhari. A second look at counting triangles in graph streams. Theor. Comput. Sci., 552:44-51, 2014. Google Scholar
  13. Graham Cormode and Hossein Jowhari. A second look at counting triangles in graph streams (corrected). Theor. Comput. Sci., 683:22-30, 2017. Google Scholar
  14. Lech Duraj, Krzysztof Kleiner, Adam Polak, and Virginia Vassilevska Williams. Equivalences between triangle and range query problems. In Shuchi Chawla, editor, Proc. 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 30-47. SIAM, 2020. Google Scholar
  15. Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. Approximately counting triangles in sublinear time. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 614-633, 2015. Google Scholar
  16. Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. SIAM J. Comput., 49(4):747-771, 2020. Google Scholar
  17. David C. Fisher. Lower bounds on the number of triangles in a graph. J. Graph Th., 13(4):505-512, 1989. Google Scholar
  18. Allan Grønlund and Seth Pettie. Threesomes, degenerates, and love triangles. J. ACM, 65(4):22:1-22:25, 2018. Google Scholar
  19. Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, and Hai Jin. Fast triangle counting on GPU. In 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019, Waltham, MA, USA, September 24-26, 2019, pages 1-7. IEEE, 2019. Google Scholar
  20. Rajesh Jayaram and John Kallaugher. An optimal algorithm for triangle counting in the stream. In Proc. 24th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, volume 207 of LIPIcs, pages 11:1-11:11, 2021. Google Scholar
  21. Madhav Jha, C. Seshadhri, and Ali Pinar. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Trans. Knowl. Discov. Data, 9(3):15:1-15:21, 2015. Google Scholar
  22. Hossein Jowhari and Mohammad Ghodsi. New streaming algorithms for counting triangles in graphs. In Lusheng Wang, editor, Proc. 11th Annual International Conference on Computing and Combinatorics (COCOON), pages 710-716, 2005. Google Scholar
  23. John Kallaugher, Michael Kapralov, and Eric Price. The sketching complexity of graph and hypergraph counting. In Proc. 59th Annual IEEE Symposium on Foundations of Computer Science, pages 556-567. IEEE Computer Society, 2018. Google Scholar
  24. John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. The complexity of counting cycles in the adjacency list streaming model. In Dan Suciu, Sebastian Skritek, and Christoph Koch, editors, Proc. 38th ACM Symposium on Principles of Database Systems, pages 119-133. ACM, 2019. Google Scholar
  25. John Kallaugher and Eric Price. A hybrid sampling scheme for triangle counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1778-1797. SIAM, 2017. Google Scholar
  26. Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. Counting arbitrary subgraphs in data streams. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Proc. 39th International Colloquium on Automata, Languages and Programming, volume 7392 of Lecture Notes in Computer Science, pages 598-609. Springer, 2012. Google Scholar
  27. Jeong Han Kim and Van H. Vu. Divide and conquer martingales and the number of triangles in a random graph. Rand. Struct. Alg., 24(2):166-174, 2004. Google Scholar
  28. Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1-2):161-185, 2012. Google Scholar
  29. Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Predicting positive and negative links in online social networks. In Proc. 19th International Conference on World Wide Web (WWW), pages 641-650, 2010. Google Scholar
  30. Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. Approximate counting of cycles in streams. In Camil Demetrescu and Magnús M. Halldórsson, editors, Proc. 19th Annual European Symposium on Algorithms, volume 6942 of Lecture Notes in Computer Science, pages 677-688. Springer, 2011. Google Scholar
  31. Willem Mantel. Problem 28 (solution by H. Gouweniak, W. Mantel, J. Texeira de Mattes, F. Schuh, and WA Whythoff). Wiskundige Opgaven, 10:60-61, 1907. Google Scholar
  32. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting triangles in data streams. In Proc. 35th ACM Symposium on Principles of Database Systems, pages 401-411, 2016. Google Scholar
  33. Mark EJ Newman. The structure and function of complex networks. SIAM review, 45(2):167-256, 2003. Google Scholar
  34. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1-16:40, 2018. Google Scholar
  35. Rasmus Pagh and Charalampos E. Tsourakakis. Colorful triangle counting and a MapReduce implementation. Information Processing Letters, 112(7):277-281, 2012. Google Scholar
  36. A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. Counting and sampling triangles from a graph stream. Proc. VLDB Endowment, 6(14):1870-1881, 2013. Google Scholar
  37. Alexander A. Razborov. On the minimal density of triangles in graphs. Comb. Probab. Comput., 17(4):603-618, 2008. Google Scholar
  38. Jacques Rougemont and Pascal Hingamp. Dna microarray data and contextual analysis of correlation graphs. BMC bioinformatics, 4(1):1-11, 2003. Google Scholar
  39. C. Seshadhri, Ali Pinar, and Tamara G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Stat. Anal. Data Min., 7(4):294-307, 2014. Google Scholar
  40. Siddharth Suri and Sergei Vassilvitskii. Counting triangles and the curse of the last reducer. In Proc. 20th International Conference on World Wide Web (WWW), pages 607-614. ACM, 2011. Google Scholar
  41. Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. Triangle sparsifiers. J. Graph Algorithms Appl., 15(6):703-726, 2011. Google Scholar
  42. Sofya Vorotnikova. Improved 3-pass algorithm for counting 4-cycles in arbitrary order streaming. CoRR, abs/2007.13466, 2020. URL: http://arxiv.org/abs/2007.13466.
  43. Virginia Vassilevska Williams. 3sum and related problems in fine-grained complexity (invited talk). In Proc. 37th ACM Symposium on Computational Geometry, volume 189 of LIPIcs, pages 2:1-2:2. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  44. Zhiang Wu, Jie Cao, Yaqiong Wang, Youquan Wang, Lu Zhang, and Junjie Wu. hPSD: a hybrid pu-learning-based spammer detection model for product reviews. IEEE transactions on cybernetics, 50(4):1595-1606, 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail