Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs

Eden, Talya; Ron, Dana; Rosenbaum, Will

doi:10.4230/LIPIcs.ICALP.2022.56

Abstract

Counting and sampling small subgraphs are fundamental algorithmic tasks. Motivated by the need to handle massive datasets efficiently, recent theoretical work has examined the problems in the sublinear time regime. In this work, we consider the problem of sampling a k-clique in a graph from an almost uniform distribution. Specifically the algorithm should output each k-clique with probability (1±ε)/n_k, where n_k denotes the number of k-cliques in the graph and ε is a given approximation parameter. To this end, the algorithm may perform degree, neighbor, and pair queries. We focus on the class of graphs with arboricity at most α, and prove that the query complexity of the problem is Θ^*(min{nα , max {(((nα)^(k/2))/n_k)^{1/(k-1)}, (nα^(k-1))/n_k}}), where n is the number of vertices in the graph, and Θ^*(⋅) suppresses dependencies on (log n/ε)^O(k).
Our upper bound is based on defining a special auxiliary graph H_k, such that sampling edges almost uniformly in H_k translates to sampling k-cliques almost uniformly in the original graph G. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in H_k. The challenge is simulating queries to H_k while being given query access only to G. Our lower bound follows from a construction of a family of graphs with arboricity α such that in each graph there are n_k k-cliques, where one of these cliques is "hidden" and hence hard to sample.

Balázs Adamcsek, Gergely Palla, Illés J Farkas, Imre Derényi, and Tamás Vicsek. Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22(8):1021-1023, 2006.
Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Sublinear-time algorithms for counting star subgraphs via edge sampling. Algorithmica, 80(2):668-697, 2018.
Uri Alon. An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC, 2006.
Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference (ITCS 2019), volume 124 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1-6:20, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ITCS.2019.6.
Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, and Makrand Sinha. Edge estimation with independent set oracles. ACM Transactions on Algorithms (TALG), 16(4):1-27, 2020.
Anup Bhattacharya, Arijit Bishnu, Arijit Ghosh, and Gopinath Mishra. Triangle estimation using tripartite independent set queries. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
Mansurul A Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. Guise: Uniform sampling of graphlets for large graph analysis. In 2012 IEEE 12th International Conference on Data Mining, pages 91-100. IEEE, 2012.
Arijit Bishnu, Arijit Ghosh, and Gopinath Mishra. Bipartite independent set oracles and beyond: Can it even count triangles in polylogarithmic queries? arXiv preprint, 2021. URL: http://arxiv.org/abs/2110.03836.
Amartya Shankha Biswas, Talya Eden, Quanquan C. Liu, Slobodan Mitrovic, and Ronitt Rubinfeld. Parallel algorithms for small subgraph counting. CoRR, abs/2002.08299, 2020. URL: http://arxiv.org/abs/2002.08299.
Amartya Shankha Biswas, Talya Eden, and Ronitt Rubinfeld. Towards a decomposition-optimal algorithm for counting and sampling arbitrary motifs in sublinear time. In Mary Wootters and Laura Sanità, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 55:1-55:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.55.
Marco Bressan. Efficient and near-optimal algorithms for sampling connected subgraphs. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1132-1143, New York, NY, USA, 2021. Association for Computing Machinery. URL: https://doi.org/10.1145/3406325.3451042.
Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi. Motif counting beyond five nodes. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(4):1-25, 2018.
Broto Chakrabarty and Nita Parekh. Naps: network analysis of protein structures. Nucleic acids research, 44(W1):W375-W382, 2016.
Jianer Chen, Xiuzhen Huang, Iyad A Kanj, and Ge Xia. Strong computational lower bounds via parameterized complexity. Journal of Computer and System Sciences, 72(8):1346-1367, 2006.
Xi Chen, Amit Levi, and Erik Waingarten. Nearly optimal edge estimation with independent set queries. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2916-2935. SIAM, 2020.
Xiaowei Chen, Yongkun Li, Pinghui Wang, and John CS Lui. A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment, 10(3), 2016.
Maximilien Danisch, Oana Balalau, and Mauro Sozio. Listing k-cliques in sparse real-world graphs. In Proceedings of the 2018 World Wide Web Conference, pages 589-598, 2018.
Dhruba Deb, Saraswathi Vishveshwara, and Smitha Vishveshwara. Understanding protein structure from a percolation perspective. Biophysical journal, 97(6):1787-1794, 2009.
Nurcan Durak, Ali Pinar, Tamara G. Kolda, and C. Seshadhri. Degree relations of triangles in real-world networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 1712-1716, New York, NY, USA, 2012. Association for Computing Machinery. URL: https://doi.org/10.1145/2396761.2398503.
Talya Eden, Amit Levi, Dana Ron, and C Seshadhri. Approximately counting triangles in sublinear time. In Foundations of Computer Science , 2015 IEEE 56th Annual Symposium on, pages 614-633. IEEE, 2015.
Talya Eden, Saleet Mossel, and Dana Ron. Approximating the arboricity in sublinear time. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 2404-2425. SIAM, 2022. URL: https://doi.org/10.1137/1.9781611977073.96.
Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. Sampling multiple edges efficiently. In Mary Wootters and Laura Sanità, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 51:1-51:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.51.
Talya Eden, Dana Ron, and Will Rosenbaum. The arboricity captures the complexity of sampling edges. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece., pages 52:1-52:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.52.
Talya Eden, Dana Ron, and Will Rosenbaum. Almost optimal bounds for sublinear-time sampling of k-cliques: Sampling cliques is harder than counting. CoRR, abs/2012.04090, 2020. URL: http://arxiv.org/abs/2012.04090.
Talya Eden, Dana Ron, and C. Seshadhri. Sublinear time estimation of degree distribution moments: The arboricity connection. SIAM J. Discrete Math., 33(4):2267-2285, 2019. URL: https://doi.org/10.1137/17M1159014.
Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1467-1478, 2020. URL: https://doi.org/10.1137/1.9781611975994.89.
Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. SIAM Journal on Computing, 49(4):747-771, 2020. URL: https://doi.org/10.1137/18M1176701.
Talya Eden and Will Rosenbaum. Lower bounds for approximating graph parameters via communication complexity. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques 2018, pages 11:1-11:18, 2018. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.11.
Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, pages 7:1-7:9, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.7.
David Eppstein and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. In International Symposium on Experimental Algorithms, pages 364-375. Springer, 2011.
Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964-984, 2006.
Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), pages 45:1-45:13, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.45.
Irene Finocchi, Marco Finocchi, and Emanuele G Fusco. Clique counting in mapreduce: Algorithms and experiments. Journal of Experimental Algorithmics (JEA), 20:1-20, 2015.
Gaurav Goel and Jens Gustedt. Bounded arboricity to determine the local structure of sparse graphs. In International Workshop on Graph-Theoretic Concepts in Computer Science, pages 159-167. Springer, 2006.
Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473-493, 2008. URL: https://doi.org/10.1002/rsa.20203.
Mira Gonen, Dana Ron, and Yuval Shavitt. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics, 25(3):1365-1411, 2011.
Shweta Jain and C Seshadhri. A fast and provable method for estimating clique counts using turán’s theorem. In Proceedings of the 26th international conference on world wide web, pages 441-449, 2017.
Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746-1758, 2004.
Idit Kosti, Predrag Radivojac, and Yael Mandel-Gutfreund. An integrated regulatory network reveals pervasive cross-regulation among transcription and splicing factors. PLoS computational biology, 8(7):e1002603, 2012.
Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu. Densest subgraph in dynamic graph streams. In Giuseppe F. Italiano, Giovanni Pighizzini, and Donald Sannella, editors, Mathematical Foundations of Computer Science 2015 - 40th International Symposium, MFCS 2015, Milan, Italy, August 24-28, 2015, Proceedings, Part II, volume 9235 of Lecture Notes in Computer Science, pages 472-482. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-48054-0_39.
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 815-824, 2015.
Tanay Kumar Saha and Mohammad Al Hasan. Finding network motifs using mcmc sampling. In Complex Networks VI, pages 13-24. Springer, 2015.
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. Patterns and anomalies in k-cores of real-world graphs with applications. Knowledge and Information Systems, 54(3):677-710, 2018.
Jakub Tetek. Approximate triangle counting via sampling and fast matrix multiplication. CoRR, abs/2104.08501, 2021. To appear in ICALP 2022. URL: http://arxiv.org/abs/2104.08501.
Jakub Tětek and Mikkel Thorup. Edge sampling and graph parameter estimation via vertex neighborhood accesses. CoRR, 2021. To appear in STOC 2022. URL: http://arxiv.org/abs/2107.03821.
Virginia Vassilevska. Efficient algorithms for clique problems. Information Processing Letters, 109(4):254-257, 2009.
Pinghui Wang, John CS Lui, Bruno Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. Efficiently estimating motif statistics of large networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 9(2):1-27, 2014.

Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs

Authors Talya Eden , Dana Ron , Will Rosenbaum

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs

Authors Talya Eden , Dana Ron , Will Rosenbaum

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message