A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

Authors Sepehr Assadi, Michael Kapralov, Sanjeev Khanna



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2019.6.pdf
  • Filesize: 0.55 MB
  • 20 pages

Document Identifiers

Author Details

Sepehr Assadi
  • Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
Michael Kapralov
  • School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
Sanjeev Khanna
  • Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA

Cite As Get BibTex

Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 124, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ITCS.2019.6

Abstract

In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(m^{rho(H)}), where rho(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H. 
We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 +/- epsilon)-approximation with high probability in O(m^{rho(H)}/#H) * poly(log(n),1/epsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Sublinear-time algorithms
  • Subgraph counting
  • AGM bound

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nesreen K. Ahmed, Jennifer Neville, and Ramana Rao Kompella. Network Sampling: From Static to Streaming Graphs. TKDD, 8(2):7:1-7:56, 2013. Google Scholar
  2. Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling. Algorithmica, 80(2):668-697, 2018. Google Scholar
  3. Noga Alon. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel Journal of Mathematics, 1981. Google Scholar
  4. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. arXiv, abs/1811.07780, 2018. URL: http://arxiv.org/abs/1811.07780.
  5. Albert Atserias, Martin Grohe, and Dániel Marx. Size Bounds and Query Plans for Relational Joins. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 739-748. IEEE Computer Society, 2008. URL: http://dx.doi.org/10.1109/FOCS.2008.43.
  6. Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA., pages 623-632, 2002. Google Scholar
  7. Suman K. Bera and Amit Chakrabarti. Towards Tighter Space Bounds for Counting Triangles and Other Substructures in Graph Streams. In 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017, March 8-11, 2017, Hannover, Germany, pages 11:1-11:14, 2017. Google Scholar
  8. E. Bloedorn, N. Rothleder, D. DeBarr, and L. Rosen. Relational Graph Analysis with Real-World Constraints: An Application in IRS Tax Fraud Detection. In AAAI, 2005. Google Scholar
  9. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. How Hard Is Counting Triangles in the Streaming Model? In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I, pages 244-254, 2013. Google Scholar
  10. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. Counting triangles in data streams. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA, pages 253-262, 2006. Google Scholar
  11. S. Burt. Structural Holes and Good Ideas. The American Journal of Sociology, 110(2):349-399, 2004. URL: http://dx.doi.org/10.2307/3568221.
  12. Bernard Chazelle, Ronitt Rubinfeld, and Luca Trevisan. Approximating the Minimum Spanning Tree Weight in Sublinear Time. SIAM J. Comput., 34(6):1370-1379, 2005. Google Scholar
  13. Graham Cormode and Hossein Jowhari. A second look at counting triangles in graph streams (corrected). Theor. Comput. Sci., 683:22-30, 2017. Google Scholar
  14. Artur Czumaj, Funda Ergün, Lance Fortnow, Avner Magen, Ilan Newman, Ronitt Rubinfeld, and Christian Sohler. Approximating the Weight of the Euclidean Minimum Spanning Tree in Sublinear Time. SIAM J. Comput., 35(1):91-109, 2005. Google Scholar
  15. Artur Czumaj and Christian Sohler. Estimating the weight of metric minimum spanning trees in sublinear-time. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 175-183, 2004. Google Scholar
  16. Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. Approximately Counting Triangles in Sublinear Time. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 614-633, 2015. Google Scholar
  17. Talya Eden, Dana Ron, and C. Seshadhri. Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection. In 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, pages 7:1-7:13, 2017. Google Scholar
  18. Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 722-734, 2018. Google Scholar
  19. Talya Eden and Will Rosenbaum. Lower Bounds for Approximating Graph Parameters via Communication Complexity. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, pages 11:1-11:18, 2018. Google Scholar
  20. Talya Eden and Will Rosenbaum. On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, pages 7:1-7:9, 2018. Google Scholar
  21. Uriel Feige. On sums of independent random variables with unbounded variance, and estimating the average degree in a graph. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 594-603, 2004. Google Scholar
  22. Ehud Friedgut and Jeff Kahn. On the number of copies of one hypergraph in another. Israel Journal of Mathematics, 1998. Google Scholar
  23. Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. Google Scholar
  24. Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Struct. Algorithms, 32(4):473-493, 2008. Google Scholar
  25. Mira Gonen, Dana Ron, and Yuval Shavitt. Counting Stars and Other Small Subgraphs in Sublinear Time. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 99-116, 2010. Google Scholar
  26. Avinatan Hassidim, Jonathan A. Kelner, Huy N. Nguyen, and Krzysztof Onak. Local Graph Partitions for Approximation and Testing. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, October 25-27, 2009, Atlanta, Georgia, USA, pages 22-31, 2009. Google Scholar
  27. Madhav Jha, C. Seshadhri, and Ali Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, pages 589-597, 2013. Google Scholar
  28. Hossein Jowhari and Mohammad Ghodsi. New Streaming Algorithms for Counting Triangles in Graphs. In Computing and Combinatorics, 11th Annual International Conference, COCOON 2005, Kunming, China, August 16-29, 2005, Proceedings, pages 710-716, 2005. Google Scholar
  29. John Kallaugher, Michael Kapralov, and Eric Price. The Sketching Complexity of Graph and Hypergraph Counting. CoRR, abs/1808.04995. To appear in FOCS 2018., 2018. Google Scholar
  30. John Kallaugher and Eric Price. A Hybrid Sampling Scheme for Triangle Counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1778-1797, 2017. Google Scholar
  31. Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. Counting Arbitrary Subgraphs in Data Streams. In Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part II, pages 598-609, 2012. Google Scholar
  32. Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight Bounds for Testing Bipartiteness in General Graphs. SIAM J. Comput., 33(6):1441-1483, 2004. Google Scholar
  33. Ju-Sung Lee and Jürgen Pfeffer. Estimating Centrality Statistics for Complete and Sampled Networks: Some Approaches and Complications. In 48th Hawaii International Conference on System Sciences, HICSS 2015, Kauai, Hawaii, USA, January 5-8, 2015, pages 1686-1695, 2015. Google Scholar
  34. Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pages 631-636, 2006. Google Scholar
  35. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better Algorithms for Counting Triangles in Data Streams. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 401-411, 2016. Google Scholar
  36. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824-827, October 2002. Google Scholar
  37. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case Optimal Join Algorithms. J. ACM, 65(3):16:1-16:40, 2018. Google Scholar
  38. Huy N. Nguyen and Krzysztof Onak. Constant-Time Approximation Algorithms via Local Improvements. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 327-336, 2008. Google Scholar
  39. Krzysztof Onak, Dana Ron, Michal Rosen, and Ronitt Rubinfeld. A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1123-1131, 2012. Google Scholar
  40. Michal Parnas and Dana Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theor. Comput. Sci., 381(1-3):183-196, 2007. Google Scholar
  41. Olivia Simpson, C. Seshadhri, and Andrew McGregor. Catching the Head, Tail, and Everything in Between: A Streaming Algorithm for the Degree Distribution. In 2015 IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, NJ, USA, November 14-17, 2015, pages 979-984, 2015. Google Scholar
  42. Johan Ugander, Lars Backstrom, and Jon Kleinberg. Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 1307-1318, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee. URL: http://dl.acm.org/citation.cfm?id=2488388.2488502.
  43. Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. An improved constant-time approximation algorithm for maximum matchings. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 225-234, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail