On the Complexity of Sampling Vertices Uniformly from a Graph

Authors Flavio Chierichetti , Shahrzad Haddadan



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.149.pdf
  • Filesize: 0.51 MB
  • 13 pages

Document Identifiers

Author Details

Flavio Chierichetti
  • Dipartimento di Informatica, Sapienza University, Rome, Italy
Shahrzad Haddadan
  • Dipartimento di Informatica, Sapienza University, Rome, Italy

Cite As Get BibTex

Flavio Chierichetti and Shahrzad Haddadan. On the Complexity of Sampling Vertices Uniformly from a Graph. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 149:1-149:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ICALP.2018.149

Abstract

We study a number of graph exploration problems in the following natural scenario: an algorithm starts exploring an undirected graph from some seed vertex; the algorithm, for an arbitrary vertex v that it is aware of, can ask an oracle to return the set of the neighbors of v. (In the case of social networks, a call to this oracle corresponds to downloading the profile page of user v.) The goal of the algorithm is to either learn something (e.g., average degree) about the graph, or to return some random function of the graph (e.g., a uniform-at-random vertex), while accessing/downloading as few vertices of the graph as possible.
Motivated by practical applications, we study the complexities of a variety of problems in terms of the graph's mixing time t_{mix} and average degree d_{avg} - two measures that are believed to be quite small in real-world social networks, and that have often been used in the applied literature to bound the performance of online exploration algorithms.
Our main result is that the algorithm has to access Omega (t_{mix} d_{avg} epsilon^{-2} ln delta^{-1}) vertices to obtain, with probability at least 1-delta, an epsilon additive approximation of the average of a bounded function on the vertices of a graph - this lower bound matches the performance of an algorithm that was proposed in the literature.
We also give tight bounds for the problem of returning a close-to-uniform-at-random vertex from the graph. Finally, we give lower bounds for the problems of estimating the average degree of the graph, and the number of vertices of the graph.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Graph algorithms
Keywords
  • Social Networks
  • Sampling
  • Graph Exploration
  • Lower Bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue Moon, and Hawoong Jeong. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th International Conference on World Wide Web, WWW '07, pages 835-844, New York, NY, USA, 2007. ACM. URL: http://dx.doi.org/10.1145/1242572.1242685.
  2. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data, 4(3):13:1-13:28, October 2010. Google Scholar
  3. Anna Ben-Hamou, Roberto I. Oliveira, and Yuval Peres. Estimating graph parameters via random walks with restarts, pages 1702-1714. SIAM, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.111.
  4. E. Blais, C. Canonne, S. Chakraborty, G. Kamath, and C. Seshadhri. Property testing review,the latest in property testing and sublinear time algorithms(blog post). URL: https://ptreview.sublinear.info/?p=918.
  5. Marco Bressan, Enoch Peserico, and Luca Pretto. Simple set cardinality estimation through random sampling. arXiv:1512.07901, 2015. Google Scholar
  6. Ran Canetti, Guy Even, and Oded Goldreich. Lower bounds for sampling algorithms for estimating the average. Information Processing Letters, 53(1):17-25, 1995. URL: http://dx.doi.org/10.1016/0020-0190(94)00171-T.
  7. Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, and Tamás Sarlós. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web, WWW '16, pages 471-481, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee. URL: http://dx.doi.org/10.1145/2872427.2883045.
  8. Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. Chernoff-Hoeffding bounds for markov chains: Generalized and simplified. In Thomas Wilke Christoph Dürr, editor, STACS'12 (29th Symposium on Theoretical Aspects of Computer Science), volume 14, pages 124-135, Paris, France, 2012. LIPIcs. URL: https://hal.archives-ouvertes.fr/hal-00678208.
  9. Colin Cooper, Tomasz Radzik, and Yiannis Siantos. Estimating network parameters using random walks. Social Network Analysis and Mining, 4(1):168, 2014. URL: http://dx.doi.org/10.1007/s13278-014-0168-6.
  10. Anirban Dasgupta, Ravi Kumar, and Tamas Sarlos. On estimating the average degree. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pages 795-806, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2566486.2568019.
  11. Anirban Dasgupta, Ravi Kumar, and D. Sivakumar. Social sampling. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 235-243, New York, NY, USA, 2012. ACM. URL: http://dx.doi.org/10.1145/2339530.2339572.
  12. Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. arXiv:1706.09748, 2017. Google Scholar
  13. Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput., 35(4):964-984, 2006. URL: http://dx.doi.org/10.1137/S0097539704447304.
  14. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In Proceedings of the 29th Conference on Information Communications, INFOCOM'10, pages 2498-2506, Piscataway, NJ, USA, 2010. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=1833515.1833840.
  15. Oded Goldreich. Introduction to Testing Graph Properties, pages 105-141. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. URL: http://dx.doi.org/10.1007/978-3-642-16367-8_7.
  16. Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Struct. Algorithms, 32(4):473-493, 2008. URL: http://dx.doi.org/10.1002/rsa.v32:4.
  17. Varun Kanade, Frederik Mallmann-Trenn, and Victor Verdugo. How large is your graph? CoRR, abs/1702.03959, 2017. URL: http://arxiv.org/abs/1702.03959,
  18. Liran Katzir, Edo Liberty, Oren Somekh, and Ioana A. Cosma. Estimating sizes of social networks via biased sampling. Internet Mathematics, 10(3-4):335-359, 2014. URL: http://dx.doi.org/10.1080/15427951.2013.862883.
  19. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD '05, pages 177-187, New York, NY, USA, 2005. ACM. URL: http://dx.doi.org/10.1145/1081870.1081893.
  20. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael~W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math., 6(1):29-123, 2009. URL: http://projecteuclid.org/euclid.im/1283973327.
  21. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, pages 29-42, New York, NY, USA, 2007. ACM. URL: http://dx.doi.org/10.1145/1298306.1298311.
  22. Thomas Schank and Dorothea Wagner. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications, 9:265-275, 2005. URL: http://dx.doi.org/10.7155/jgaa.00108.
  23. C. Seshadhri, Ali Pinar, and Tamara G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Stat. Anal. Data Min., 7(4):294-307, 2014. URL: http://dx.doi.org/10.1002/sam.11224.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail