Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time

Authors Amartya Shankha Biswas, Talya Eden , Ronitt Rubinfeld



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2021.55.pdf
  • Filesize: 0.95 MB
  • 19 pages

Document Identifiers

Author Details

Amartya Shankha Biswas
  • CSAIL, Massachusetts Institute of Technology, Cambridge MA, USA
Talya Eden
  • CSAIL, Massachusetts Institute of Technology, Cambridge MA, USA
  • sites.google.com/view/edentalya/home
Ronitt Rubinfeld
  • CSAIL, Massachusetts Institute of Technology, Cambridge MA, USA

Acknowledgements

Talya Eden is thankful to Dana Ron and Oded Goldreich for their valuable suggestions regarding the presentation of the lower bound results. The authors are thankful for the anonymous reviewers for their useful comments and observations.

Cite AsGet BibTex

Amartya Shankha Biswas, Talya Eden, and Ronitt Rubinfeld. Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 55:1-55:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.55

Abstract

We consider the problem of sampling and approximately counting an arbitrary given motif H in a graph G, where access to G is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of H into a collection of odd cycles and stars, denoted D^*(H) = {O_{k₁},...,O_{k_q}, S_{p₁},...,S_{p_𝓁}}. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling arbitrary motifs which, up to poly(log n) factors, is always at least as good, and for most graphs G is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs H with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Sublinear time algorithms
  • Graph algorithms
  • Sampling subgraphs
  • Approximate counting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nesreen K Ahmed, Jennifer Neville, Ryan A Rossi, and Nick Duffield. Efficient graphlet counting for large networks. In 2015 IEEE International Conference on Data Mining, pages 1-10. IEEE, 2015. Google Scholar
  2. Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Sublinear-time algorithms for counting star subgraphs via edge sampling. Algorithmica, 80(2):668-697, 2018. Google Scholar
  3. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference (ITCS 2019), volume 124 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1-6:20, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ITCS.2019.6.
  4. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 739-748. IEEE, 2008. Google Scholar
  5. Haim Avron. Counting triangles in large graphs using randomized matrix trace estimation. In Workshop on Large-scale Data Mining: Theory and Applications, volume 10, pages 10-9, 2010. Google Scholar
  6. Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, and Makrand Sinha. Edge estimation with independent set oracles. arXiv preprint arXiv:1711.07567, 2017. Google Scholar
  7. Suman K. Bera, Noujan Pashanasangi, and C. Seshadhri. Linear time subgraph counting, graph degeneracy, and the chasm at size six. In 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12-14, 2020, Seattle, Washington, USA, pages 38:1-38:20, 2020. URL: https://doi.org/10.4230/LIPIcs.ITCS.2020.38.
  8. Amartya Shankha Biswas, Talya Eden, and Ronitt Rubinfeld. Towards a decomposition-optimal algorithm for counting and sampling arbitrary motifs in sublinear time, 2021. URL: http://arxiv.org/abs/2107.06582.
  9. Andreas Bjöklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto. Counting paths and packings in halves. Algorithms - ESA 2009, page 578–586, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_52.
  10. Xi Chen, Amit Levi, and Erik Waingarten. Nearly optimal edge estimation with independent set queries. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 2916-2935, 2020. URL: https://doi.org/10.1137/1.9781611975994.177.
  11. Graham Cormode and Hossein Jowhari. L p samplers and their applications: A survey. ACM Computing Surveys (CSUR), 52(1):1-31, 2019. Google Scholar
  12. Maximilien Danisch, Oana Balalau, and Mauro Sozio. Listing k-cliques in sparse real-world graphs. In Proceedings of the 2018 World Wide Web Conference, pages 589-598. International World Wide Web Conferences Steering Committee, 2018. Google Scholar
  13. Talya Eden, Amit Levi, Dana Ron, and C Seshadhri. Approximately counting triangles in sublinear time. SIAM Journal on Computing, 46(5):1603-1646, 2017. Google Scholar
  14. Talya Eden, Dana Ron, and Will Rosenbaum. The arboricity captures the complexity of sampling edges. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece., pages 52:1-52:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.52.
  15. Talya Eden, Dana Ron, and Will Rosenbaum. Almost optimal bounds for sublinear-time sampling of k-cliques: Sampling cliques is harder than counting. arXiv preprint arXiv:2012.04090, 2020. Google Scholar
  16. Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 722-734, 2018. URL: https://doi.org/10.1145/3188745.3188810.
  17. Talya Eden, Dana Ron, and C. Seshadhri. Sublinear time estimation of degree distribution moments: The arboricity connection. SIAM J. Discrete Math., 33(4):2267-2285, 2019. URL: https://doi.org/10.1137/17M1159014.
  18. Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1467-1478, 2020. URL: https://doi.org/10.1137/1.9781611975994.89.
  19. Talya Eden and Will Rosenbaum. Lower bounds for approximating graph parameters via communication complexity. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 11:1-11:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.11.
  20. Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, volume 61 of OASICS, pages 7:1-7:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.7.
  21. Patrick Eichenberger, Masaya Fujita, Shane T Jensen, Erin M Conlon, David Z Rudner, Stephanie T Wang, Caitlin Ferguson, Koki Haga, Tsutomu Sato, Jun S Liu, et al. The program of gene transcription for a single differentiating cell type during sporulation in bacillus subtilis. PLoS biology, 2(10):e328, 2004. Google Scholar
  22. Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964-984, 2006. Google Scholar
  23. Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), pages 45:1-45:13, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.45.
  24. Jacob Fox, Tim Roughgarden, C. Seshadhri, Fan Wei, and Nicole Wein. Finding cliques in social networks: A new distribution-free model. SIAM J. Comput., 49(2):448-464, 2020. URL: https://doi.org/10.1137/18M1210459.
  25. Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473-493, 2008. URL: https://doi.org/10.1002/rsa.20203.
  26. Mira Gonen, Dana Ron, and Yuval Shavitt. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics, 25(3):1365-1411, 2011. Google Scholar
  27. Shweta Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán’s theorem. In Conference on the World Wide Web, pages 441-449, 2017. Google Scholar
  28. Krzysztof Juszczyszyn, Przemysław Kazienko, and Katarzyna Musiał. Local topology of social network based on motif analysis. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pages 97-105. Springer, 2008. Google Scholar
  29. Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6):1441-1483, 2004. URL: https://doi.org/10.1137/S0097539703436424.
  30. Tong Ihn Lee, Nicola J Rinaldi, François Robert, Duncan T Odom, Ziv Bar-Joseph, Georg K Gerber, Nancy M Hannett, Christopher T Harbison, Craig M Thompson, Itamar Simon, et al. Transcriptional regulatory networks in saccharomyces cerevisiae. science, 298(5594):799-804, 2002. Google Scholar
  31. Wenzhe Ma, Ala Trusina, Hana El-Samad, Wendell A Lim, and Chao Tang. Defining network topologies that can achieve biochemical adaptation. Cell, 138(4):760-773, 2009. Google Scholar
  32. DE Nelson, AEC Ihekwaba, M Elliott, JR Johnson, CA Gibney, BE Foreman, G Nelson, V See, CA Horton, DG Spiller, et al. Oscillations in nf-κb signaling control the dynamics of gene expression. Science, 306(5696):704-708, 2004. Google Scholar
  33. Duncan T Odom, Nora Zizlsperger, D Benjamin Gordon, George W Bell, Nicola J Rinaldi, Heather L Murray, Tom L Volkert, Jörg Schreiber, P Alexander Rolfe, David K Gifford, et al. Control of pancreas and liver gene expression by hnf transcription factors. Science, 303(5662):1378-1381, 2004. Google Scholar
  34. Rasmus Pagh and Charalampos E Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 112:277-281, 2012. Google Scholar
  35. Ashwin Paranjape, Austin R Benson, and Jure Leskovec. Motifs in temporal networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 601-610. ACM, 2017. Google Scholar
  36. Shai S Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. Network motifs in the transcriptional regulation network of escherichia coli. Nature genetics, 31(1):64, 2002. Google Scholar
  37. Jakub Tětek and Mikkel Thorup. Sampling and counting edges via vertex accesses, 2021. Google Scholar
  38. Alexandru Topirceanu, Alexandra Duma, and Mihai Udrescu. Uncovering the fingerprint of online social networks using a network motif based approach. Computer Communications, 73:167-175, 2016. Google Scholar
  39. Charalampos E Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and laws. In International Conference on Data Mining, pages 608-617, 2008. Google Scholar
  40. Jakub Tětek. Approximate triangle counting via sampling and fast matrix multiplication. CoRR, abs/2104.08501, 2021. URL: http://arxiv.org/abs/2104.08501.
  41. John J Tyson and Béla Novák. Functional motifs in biochemical reaction networks. Annual review of physical chemistry, 61:219-240, 2010. Google Scholar
  42. Virginia Vassilevska. Efficient algorithms for clique problems. Information Processing Letters, 109(4):254-257, 2009. URL: https://doi.org/10.1016/j.ipl.2008.10.014.
  43. Qiankun Zhao, Yuan Tian, Qi He, Nuria Oliver, Ruoming Jin, and Wang-Chien Lee. Communication motifs: a tool to characterize social communications. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1645-1648. ACM, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail