Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time

Fichtenberger, Hendrik; Gao, Mingze; Peng, Pan

doi:10.4230/LIPIcs.ICALP.2020.45

File

LIPIcs.ICALP.2020.45.pdf

Filesize: 0.53 MB
13 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICALP.2020.45
URN: urn:nbn:de:0030-drops-124526

Author Details

Hendrik Fichtenberger

Department of Computer Science, TU Dortmund, Germany

Mingze Gao

Department of Computer Science, University of Sheffield, UK

Pan Peng

Department of Computer Science, University of Sheffield, UK

Acknowledgements

We would like to thank the anonymous reviewers for their detailed comments. In particular, we would like to thank an anonymous reviewer for their suggestion to improve the presentation of the proof of Theorem 2 and their comment on applications, which we included as future work.

Cite AsGet BibTex

Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 45:1-45:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICALP.2020.45

Abstract

We present a simple sublinear-time algorithm for sampling an arbitrary subgraph H exactly uniformly from a graph G, to which the algorithm has access by performing the following types of queries: (1) uniform vertex queries, (2) degree queries, (3) neighbor queries, (4) pair queries and (5) edge sampling queries. The query complexity and running time of our algorithm are Õ(min{m, (m^ρ(H))/#H}) and Õ((m^ρ(H))/#H}), respectively, where ρ(H) is the fractional edge-cover of H and #H is the number of copies of H in G. For any clique on r vertices, i.e., H = K_r, our algorithm is almost optimal as any algorithm that samples an H from any distribution that has Ω(1) total probability mass on the set of all copies of H must perform Ω(min{m, (m^ρ(H))/(#H⋅(cr)^r)}) queries. Together with the query and time complexities of the (1±ε)-approximation algorithm for the number of subgraphs H by Assadi et al. [Sepehr Assadi et al., 2018] and the lower bound by Eden and Rosenbaum [Eden and Rosenbaum, 2018] for approximately counting cliques, our results suggest that in our query model, approximately counting cliques is "equivalent to" exactly uniformly sampling cliques, in the sense that the query and time complexities of exactly uniform sampling and randomized approximate counting are within polylogarithmic factor of each other. This stands in interesting contrast to an analogous relation between approximate counting and almost uniformly sampling for self-reducible problems in the polynomial-time regime by Jerrum, Valiant and Vazirani [Jerrum et al., 1986].

Subject Classification

ACM Subject Classification

Theory of computation → Streaming, sublinear and near linear time algorithms

Keywords

Graph sampling
Graph algorithms
Sublinear-time algorithms

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Nesreen K Ahmed, Nick Duffield, Theodore L Willke, and Ryan A Rossi. On sampling from massive graph streams. Proceedings of the VLDB Endowment, 10(11), 2017. URL: https://doi.org/10.14778/3137628.3137651.
Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, and Anak Yodpinyanee. Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling. Algorithmica, 2017. URL: https://doi.org/10.1007/s00453-017-0287-3.
Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In Proceedings of the 10th Innovations in Theoretical Computer Science Conference (ITCS), volume 124. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ITCS.2019.6.
Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2008. URL: https://doi.org/10.1109/FOCS.2008.43.
Anna Ben-Hamou, Roberto I Oliveira, and Yuval Peres. Estimating graph parameters via random walks with restarts. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018. URL: https://doi.org/10.1137/1.9781611975031.111.
Flavio Chiericetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, and Tamás Sarlós. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web, 2016. URL: https://doi.org/10.1145/2872427.2883045.
Talya Eden, Amit Levi, Dana Ron, and C Seshadhri. Approximately counting triangles in sublinear time. SIAM Journal on Computing, 46(5), 2017. URL: https://doi.org/10.1137/15M1054389.
Talya Eden, Dana Ron, and Will Rosenbaum. The Arboricity Captures the Complexity of Sampling Edges. In 46th International Colloquium on Automata, Languages, and Programming (ICALP), volume 132. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.52.
Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximations of k-cliques for low arboricity graphs. CoRR, abs/1811.04425, 2018. URL: http://arxiv.org/abs/1811.04425.
Talya Eden, Dana Ron, and C Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018. URL: https://doi.org/10.1145/3188745.3188810.
Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. arXiv preprint, 2017. URL: http://arxiv.org/abs/1706.09748.
Talya Eden and Will Rosenbaum. Lower Bounds for Approximating Graph Parameters via Communication Complexity. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), volume 116. Schloss DagstuhlendashLeibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.11.
Talya Eden and Will Rosenbaum. On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms (SOSA), volume 61. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.7.
Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4), 2006. URL: https://doi.org/10.1137/S0097539704447304.
Weiming Feng, Yuxin Sun, and Yitong Yin. What can be sampled locally? In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2017. URL: https://doi.org/10.1007/s00446-018-0332-8.
Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4), 2008. URL: https://doi.org/10.1002/rsa.20203.
Pili Hu and Wing Cheong Lau. A survey and taxonomy of graph sampling. arXiv preprint, 2013. URL: http://arxiv.org/abs/1308.5865.
Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43, 1986. URL: https://doi.org/10.5555/11534.11537.
Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6), 2004. URL: https://doi.org/10.1137/S0097539703436424.
Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006. URL: https://doi.org/10.1145/1150402.1150479.
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594), 2002. URL: https://doi.org/10.1126/science.298.5594.824.
Von Neumann. Various techniques used in connection with random digits. National Bureau of Standards, Applied Math Series, 12, 1950.
Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and rapidly mixing markov chains. Information and Computation, 82(1), 1989. URL: https://doi.org/10.1016/0890-5401(89)90067-9.