Sampling Multiple Edges Efficiently

Authors Talya Eden , Saleet Mossel, Ronitt Rubinfeld



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2021.51.pdf
  • Filesize: 0.79 MB
  • 15 pages

Document Identifiers

Author Details

Talya Eden
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
Saleet Mossel
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
Ronitt Rubinfeld
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA

Cite AsGet BibTex

Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. Sampling Multiple Edges Efficiently. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 51:1-51:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.51

Abstract

We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ε-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of Θ^*(n/√ m) for sampling a single edge in general graphs (where O^*(⋅) suppresses poly(1/ε) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O^*(√ q ⋅(n/√ m)), which is strictly preferable to O^*(q⋅ (n/√ m)) cost resulting from q invocations of the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, Tětek and Thorup (arXiv, preprint) proved that this bound is essentially optimal.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • Sampling edges
  • graph algorithm
  • sublinear algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nesreen K Ahmed, Nick Duffield, Theodore L Willke, and Ryan A Rossi. On sampling from massive graph streams. Proceedings of the VLDB Endowment, 10(11), 2017. Google Scholar
  2. Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2):1-56, 2013. Google Scholar
  3. Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Sublinear-time algorithms for counting star subgraphs via edge sampling. Algorithmica, 80(2):668-697, 2018. Google Scholar
  4. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In Innovations in Theoretical Computer Science Conference ITCS, volume 124 of LIPIcs, pages 6:1-6:20. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  5. Amartya Shankha Biswas, Talya Eden, and Ronitt Rubinfeld. Towards a decomposition-optimal algorithm for counting and sampling arbitrary motifs in sublinear time. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, to appear, 2021. Google Scholar
  6. Colin Cooper, Tomasz Radzik, and Yiannis Siantos. Estimating network parameters using random walks. Social Network Analysis and Mining, 4(1):168, 2014. Google Scholar
  7. Talya Eden, Dana Ron, and Will Rosenbaum. The arboricity captures the complexity of sampling edges. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece., pages 52:1-52:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.52.
  8. Talya Eden, Dana Ron, and Will Rosenbaum. Almost optimal bounds for sublinear-time sampling of k-cliques: Sampling cliques is harder than counting, 2020. URL: http://arxiv.org/abs/2012.04090.
  9. Talya Eden, Dana Ron, and C Seshadhri. Sublinear time estimation of degree distribution moments: The arboricity connection. SIAM Journal on Discrete Mathematics, 33(4):2267-2285, 2019. Google Scholar
  10. Talya Eden, Dana Ron, and C Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1467-1478. SIAM, 2020. Google Scholar
  11. Talya Eden, Dana Ron, and C Seshadhri. On approximating the number of k-cliques in sublinear time. SIAM Journal on Computing, 49(4):747-771, 2020. Google Scholar
  12. Talya Eden and Will Rosenbaum. Lower bounds for approximating graph parameters via communication complexity. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 11:1-11:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.11.
  13. Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, volume 61 of OASICS, pages 7:1-7:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.7.
  14. Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964-984, 2006. Google Scholar
  15. Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 45:1-45:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.45.
  16. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In INFOCOM 2010. 29th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 15-19 March 2010, San Diego, CA, USA, pages 2498-2506. IEEE, 2010. URL: https://doi.org/10.1109/INFCOM.2010.5462078.
  17. Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473-493, 2008. URL: https://doi.org/10.1002/rsa.20203.
  18. Mira Gonen, Dana Ron, and Yuval Shavitt. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics, 25(3):1365-1411, 2011. Google Scholar
  19. Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 49-58, 2011. Google Scholar
  20. Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746-1758, 2004. Google Scholar
  21. Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6):1441-1483, 2004. URL: https://doi.org/10.1137/S0097539703436424.
  22. Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 631-636, New York, NY, USA, 2006. ACM. URL: https://doi.org/10.1145/1150402.1150479.
  23. George Marsaglia, Wai Wan Tsang, Jingbo Wang, et al. Fast generation of discrete random variables. Journal of Statistical Software, 11(3):1-11, 2004. Google Scholar
  24. Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 383-389, 2010. Google Scholar
  25. Bruno Ribeiro and Don Towsley. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 390-403, 2010. Google Scholar
  26. Jakub Tětek and Mikkel Thorup. Sampling and counting edges via vertex accesses. arXiv preprint arXiv:2107.03821, 2021. Google Scholar
  27. Duru Türkoglu and Ata Turk. Edge-based wedge sampling to estimate triangle counts in very large graphs. In 2017 IEEE International Conference on Data Mining (ICDM), pages 455-464. IEEE, 2017. Google Scholar
  28. Jakub Tětek. Approximate triangle counting via sampling and fast matrix multiplication. CoRR, abs/2104.08501, 2021. URL: http://arxiv.org/abs/2104.08501.
  29. Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127-128, 1974. Google Scholar
  30. Alastair J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software, 3(3):253-256, 1977. Google Scholar
  31. Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui, Beixing Deng, and Xing Li. Understanding graph sampling algorithms for social network analysis. In 2011 31st international conference on distributed computing systems workshops, pages 123-128. IEEE, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail