How Large Is Your Graph?

Authors Varun Kanade, Frederik Mallmann-Trenn, Victor Verdugo



PDF
Thumbnail PDF

File

LIPIcs.DISC.2017.34.pdf
  • Filesize: 0.59 MB
  • 16 pages

Document Identifiers

Author Details

Varun Kanade
Frederik Mallmann-Trenn
Victor Verdugo

Cite As Get BibTex

Varun Kanade, Frederik Mallmann-Trenn, and Victor Verdugo. How Large Is Your Graph?. In 31st International Symposium on Distributed Computing (DISC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 91, pp. 34:1-34:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.DISC.2017.34

Abstract

We consider the problem of estimating the graph size, where one is given only local access to the graph. We formally define a query model in which one starts with a seed node and is allowed to make queries about neighbours of nodes that have already been seen. In the case of undirected graphs, an estimator of Katzir et al. (2014) based on a sample from the stationary distribution pi uses O(1/||pi||_2 + d_avg) queries; we prove that this is tight. In addition, we establish this as a lower bound even when the algorithm is allowed to crawl the graph arbitrarily; the results of Katzir et al. give an upper bound that is worse by a multiplicative factor t_mix(1/n^4).

The picture becomes significantly different in the case of directed graphs. We show that without strong assumptions on the graph structure, the number of nodes cannot be predicted to within a constant multiplicative factor without using a number of queries that are at least linear in the number of nodes; in particular, rapid mixing and small diameter, properties that most real-world networks exhibit, do not suffice. The question of interest is whether any algorithm can beat breadth-first search. We introduce a new parameter, generalising the well-studied conductance, such that if a suitable bound on it exists and is known to the algorithm, the number of queries required is sublinear in the number of edges; we show that this is tight.

Subject Classification

Keywords
  • Estimation
  • Random Walks
  • Social Networks

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael A. Bender and Data Ron. Testing properties of directed graphs: Acyclicity and connectivity. Random Structures and Algorithms, 20(2):184-205, 2002. Google Scholar
  2. Béla Bollobás. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, 1(4):311-316, 1980. URL: http://dx.doi.org/10.1016/S0195-6698(80)80030-8.
  3. Mickey Brautbar and Michael Kearns. Local algorithms for finding interesting individuals in large networks. In In Proceedings of ICS 2010, pages 188-199, 2010. Google Scholar
  4. Colin Cooper and Alan Frieze. Crawling on web graphs. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 419-427. ACM, 2002. Google Scholar
  5. Colin Cooper, Tomasz Radzik, and Yiannis Siantos. Estimating network parameters using random walks. In Computational Aspects of Social Networks, 4th International Conference on, pages 33-40, 2012. Google Scholar
  6. Artur Czumaj, Pan Peng, and Christian Sohler. Relating two property testing models for bounded degree directed graphs. In Proceedings of the ACM Symposium on the Theory of Computing (STOC), 2016. Google Scholar
  7. Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. On estimating the average degree. In Proceedings of the 23rd international conference on World Wide Web, pages 795-806. ACM, 2014. URL: http://dx.doi.org/10.1145/2566486.2568019.
  8. Mark Finkelstein, Howard G Tucker, and Jerry Alan Veeh. Confidence intervals for the number of unseen types. Statistics &Probability Letters, 37(4):423-430, 1998. Google Scholar
  9. Oded Goldreich. Introduction to testing graph properties. Survey article available at http://www.wisdom.weizmann.ac.il/~oded/COL/tgp-intro.pdf, 2010.
  10. I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3/4):237-264, 1953. Google Scholar
  11. Varun Kanade, Frederik Mallmann-Trenn, and Victor Verdugo. How large is your graph? CoRR, abs/1702.03959, 2017. URL: http://arxiv.org/abs/1702.03959.
  12. Liran Katzir and Stephen J. Hardiman. Estimating clustering coefficients and size of social networks via random walk. ACM Transactions on the Web, 9(4):19:1-19:20, 2015. Google Scholar
  13. Liran Katzir, Edo Liberty, Oren Somekh, and Ioana A. Cosma. Estimating sizes of social networks via biased sampling. Internet Mathematics, 10(3-4):335-359, 2014. URL: http://dx.doi.org/10.1080/15427951.2013.862883.
  14. Donald E. Knuth. Estimating the efficiency of backtrack programs. Mathematics of computation, 29(129):122-136, 1975. Google Scholar
  15. Alberto Marchetti-Spaccamela. On the estimate of the size of a directed graph. In International Workshop on Graph-Theoretic Concepts in Computer Science, pages 317-326. Springer, 1988. Google Scholar
  16. Milena Mihail, Amin Saberi, and Prasad Tetali. Random walks with lookahead on power law random graphs. Internet Mathematics, 3(2):147-152, 2006. Google Scholar
  17. Cameron Musco, Hsin-Hao Su, and Nancy A. Lynch. Ant-inspired density estimation via random walks: Extended abstract. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25-28, 2016, pages 469-478, 2016. Google Scholar
  18. Leonard Pitt. A note on extending knuth’s tree estimator to directed acyclic graphs. Information processing letters, 24(3):203-206, 1987. Google Scholar
  19. Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 685-694. ACM, 2011. Google Scholar
  20. Gregory Valiant and Paul Valiant. Estimating the unseen: improved estimators for entropy and other properties. In Advances in Neural Information Processing Systems, pages 2157-2165, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail