Counting Simplices in Hypergraph Streams
We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a k-uniform hypergraph H with n vertices and m hyperedges, each hyperedge being a k-sized subset of vertices. A k-simplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of k-simplices in H.
We design a suite of algorithms for this problem. As with triangle-counting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{-2} log δ^{-1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1-δ. We also give a simpler 1-pass algorithm that achieves O(ε^{-2} log δ^{-1} log n⋅ (m/T) (Δ_E + Δ_V^{1-1/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of k-simplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{-2}), Ω(m^{1+1/k}/T), Ω(m/T^{1-1/k}) and Ω(mΔ_V^{1/k}/T) for multi-pass algorithms and Ω(mΔ_E/T) for 1-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.
data streaming
graph algorithms
hypergraphs
sub-linear algorithms
triangle counting
Theory of computation~Sketching and sampling
32:1-32:19
Regular Paper
This work was supported in part by NSF under Award CCF-1907738.
https://arxiv.org/abs/2112.11016
Amit
Chakrabarti
Amit Chakrabarti
Dartmouth College, Hanover, NH, USA
http://www.cs.dartmouth.edu/~ac
https://orcid.org/0000-0003-3633-9180
Themistoklis
Haris
Themistoklis Haris
Dartmouth College, Hanover, NH, USA
10.4230/LIPIcs.ESA.2022.32
Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997. Preliminary version in Proc. 2nd Annual European Symposium on Algorithms , pages 354-364, 1994.
Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In Proc. 10th Conference on Innovations in Theoretical Computer Science, volume 124 of LIPIcs, pages 6:1-6:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 623-632, 2002.
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proc. 14th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 16-24, 2008.
Suman K. Bera and Amit Chakrabarti. Towards tighter space bounds for counting triangles and other substructures in graph streams. In Proc. 34th International Symposium on Theoretical Aspects of Computer Science, pages 11:1-11:14, 2017.
Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. How hard is counting triangles in the streaming model? In Proc. 40th International Colloquium on Automata, Languages and Programming, pages 244-254. Springer, 2013.
Laurent Bulteau, Vincent Froese, Konstantin Kutzkov, and Rasmus Pagh. Triangle counting in dynamic graph streams. Algorithmica, 76(1):259-278, 2016. URL: https://doi.org/10.1007/s00453-015-0036-4.
https://doi.org/10.1007/s00453-015-0036-4
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. Counting triangles in data streams. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 253-262, 2006.
Amit Chakrabarti and Themistoklis Haris. Counting simplices in hypergraph streams. arXiv preprint, 2021. URL: http://arxiv.org/abs/2112.11016.
http://arxiv.org/abs/2112.11016
Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210-223, 1985.
Nicholas A Christakis and James H Fowler. Social network sensors for early detection of contagious outbreaks. PLoS ONE, 5(9):e12948, 2010.
Graham Cormode and Hossein Jowhari. A second look at counting triangles in graph streams. Theor. Comput. Sci., 552:44-51, 2014.
Graham Cormode and Hossein Jowhari. A second look at counting triangles in graph streams (corrected). Theor. Comput. Sci., 683:22-30, 2017.
Lech Duraj, Krzysztof Kleiner, Adam Polak, and Virginia Vassilevska Williams. Equivalences between triangle and range query problems. In Shuchi Chawla, editor, Proc. 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 30-47. SIAM, 2020.
Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. Approximately counting triangles in sublinear time. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 614-633, 2015.
Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. SIAM J. Comput., 49(4):747-771, 2020.
David C. Fisher. Lower bounds on the number of triangles in a graph. J. Graph Th., 13(4):505-512, 1989.
Allan Grønlund and Seth Pettie. Threesomes, degenerates, and love triangles. J. ACM, 65(4):22:1-22:25, 2018.
Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, and Hai Jin. Fast triangle counting on GPU. In 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019, Waltham, MA, USA, September 24-26, 2019, pages 1-7. IEEE, 2019.
Rajesh Jayaram and John Kallaugher. An optimal algorithm for triangle counting in the stream. In Proc. 24th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, volume 207 of LIPIcs, pages 11:1-11:11, 2021.
Madhav Jha, C. Seshadhri, and Ali Pinar. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Trans. Knowl. Discov. Data, 9(3):15:1-15:21, 2015.
Hossein Jowhari and Mohammad Ghodsi. New streaming algorithms for counting triangles in graphs. In Lusheng Wang, editor, Proc. 11th Annual International Conference on Computing and Combinatorics (COCOON), pages 710-716, 2005.
John Kallaugher, Michael Kapralov, and Eric Price. The sketching complexity of graph and hypergraph counting. In Proc. 59th Annual IEEE Symposium on Foundations of Computer Science, pages 556-567. IEEE Computer Society, 2018.
John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. The complexity of counting cycles in the adjacency list streaming model. In Dan Suciu, Sebastian Skritek, and Christoph Koch, editors, Proc. 38th ACM Symposium on Principles of Database Systems, pages 119-133. ACM, 2019.
John Kallaugher and Eric Price. A hybrid sampling scheme for triangle counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1778-1797. SIAM, 2017.
Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. Counting arbitrary subgraphs in data streams. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Proc. 39th International Colloquium on Automata, Languages and Programming, volume 7392 of Lecture Notes in Computer Science, pages 598-609. Springer, 2012.
Jeong Han Kim and Van H. Vu. Divide and conquer martingales and the number of triangles in a random graph. Rand. Struct. Alg., 24(2):166-174, 2004.
Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1-2):161-185, 2012.
Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Predicting positive and negative links in online social networks. In Proc. 19th International Conference on World Wide Web (WWW), pages 641-650, 2010.
Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. Approximate counting of cycles in streams. In Camil Demetrescu and Magnús M. Halldórsson, editors, Proc. 19th Annual European Symposium on Algorithms, volume 6942 of Lecture Notes in Computer Science, pages 677-688. Springer, 2011.
Willem Mantel. Problem 28 (solution by H. Gouweniak, W. Mantel, J. Texeira de Mattes, F. Schuh, and WA Whythoff). Wiskundige Opgaven, 10:60-61, 1907.
Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting triangles in data streams. In Proc. 35th ACM Symposium on Principles of Database Systems, pages 401-411, 2016.
Mark EJ Newman. The structure and function of complex networks. SIAM review, 45(2):167-256, 2003.
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1-16:40, 2018.
Rasmus Pagh and Charalampos E. Tsourakakis. Colorful triangle counting and a MapReduce implementation. Information Processing Letters, 112(7):277-281, 2012.
A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. Counting and sampling triangles from a graph stream. Proc. VLDB Endowment, 6(14):1870-1881, 2013.
Alexander A. Razborov. On the minimal density of triangles in graphs. Comb. Probab. Comput., 17(4):603-618, 2008.
Jacques Rougemont and Pascal Hingamp. Dna microarray data and contextual analysis of correlation graphs. BMC bioinformatics, 4(1):1-11, 2003.
C. Seshadhri, Ali Pinar, and Tamara G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Stat. Anal. Data Min., 7(4):294-307, 2014.
Siddharth Suri and Sergei Vassilvitskii. Counting triangles and the curse of the last reducer. In Proc. 20th International Conference on World Wide Web (WWW), pages 607-614. ACM, 2011.
Charalampos E. Tsourakakis, Mihail N. Kolountzakis, and Gary L. Miller. Triangle sparsifiers. J. Graph Algorithms Appl., 15(6):703-726, 2011.
Sofya Vorotnikova. Improved 3-pass algorithm for counting 4-cycles in arbitrary order streaming. CoRR, abs/2007.13466, 2020. URL: http://arxiv.org/abs/2007.13466.
http://arxiv.org/abs/2007.13466
Virginia Vassilevska Williams. 3sum and related problems in fine-grained complexity (invited talk). In Proc. 37th ACM Symposium on Computational Geometry, volume 189 of LIPIcs, pages 2:1-2:2. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
Zhiang Wu, Jie Cao, Yaqiong Wang, Youquan Wang, Lu Zhang, and Junjie Wu. hPSD: a hybrid pu-learning-based spammer detection model for product reviews. IEEE transactions on cybernetics, 50(4):1595-1606, 2018.
Amit Chakrabarti and Themistoklis Haris
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode