Local Access to Huge Random Objects Through Partial Sampling

Authors Amartya Shankha Biswas , Ronitt Rubinfeld, Anak Yodpinyanee



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2020.27.pdf
  • Filesize: 1.39 MB
  • 65 pages

Document Identifiers

Author Details

Amartya Shankha Biswas
  • CSAIL, MIT, Cambridge, MA, USA
Ronitt Rubinfeld
  • CSAIL, MIT, Cambridge, MA, USA
Anak Yodpinyanee
  • CSAIL, MIT, Cambridge, MA, USA

Cite AsGet BibTex

Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Local Access to Huge Random Objects Through Partial Sampling. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 27:1-27:65, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ITCS.2020.27

Abstract

Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself.

Subject Classification

ACM Subject Classification
  • Theory of computation → Generating random combinatorial structures
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • sublinear time algorithms
  • random generation
  • local computation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446-6531, 2017. Google Scholar
  2. Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471-487, 2016. Google Scholar
  3. Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 670-688. IEEE, 2015. Google Scholar
  4. Maksudul Alam and Maleq Khan. Parallel algorithms for generating random networks with given degree sequences. International Journal of Parallel Programming, 45(1):109-127, 2017. Google Scholar
  5. Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-efficient local computation algorithms. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1132-1139. Society for Industrial and Applied Mathematics, 2012. Google Scholar
  6. Danielle Smith Bassett and ED Bullmore. Small-world brain networks. The neuroscientist, 12(6):512-523, 2006. Google Scholar
  7. Vladimir Batagelj and Ulrik Brandes. Efficient generation of large random networks. Physical Review E, 71(3):036113, 2005. Google Scholar
  8. Russ Bubley and Martin Dyer. Path coupling: A technique for proving rapid mixing in Markov chains. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 223-231. IEEE, 1997. Google Scholar
  9. Irineo Cabreros, Emmanuel Abbe, and Aristotelis Tsirigos. Detecting community structures in Hi-C genomic data. In Information Science and Systems (CISS), 2016 Annual Conference on, pages 584-589. IEEE, 2016. Google Scholar
  10. Yi-Jun Chang, Manuela Fischer, Mohsen Ghaffari, Jara Uitto, and Yufan Zheng. The Complexity of (Δ+ 1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, pages 471-480. ACM, 2019. Google Scholar
  11. Jingchun Chen and Bo Yuan. Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics, 22(18):2283-2290, 2006. Google Scholar
  12. Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. In Conference on Learning Theory, pages 391-423, 2015. Google Scholar
  13. Melissa S Cline, Michael Smoot, Ethan Cerami, Allan Kuchinsky, Nerius Landys, Chris Workman, Rowan Christmas, Iliana Avila-Campilo, Michael Creech, Benjamin Gross, et al. Integration of biological networks and gene expression data using Cytoscape. Nature protocols, 2(10):2366-2382, 2007. Google Scholar
  14. Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. Computational geometry. In Computational geometry, pages 105-109. Springer, 2000. Google Scholar
  15. Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experimental study of search in global social networks. science, 301(5634):827-829, 2003. Google Scholar
  16. David Easley and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010. Google Scholar
  17. Paul Erdos and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17-60, 1960. Google Scholar
  18. Guy Even, Reut Levi, Moti Medina, and Adi Rosén. Sublinear Random Access Generators for Preferential Attachment Graphs. In 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, pages 6:1-6:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.6.
  19. Uriel Feige, Boaz Patt-Shamir, and Shai Vardi. On the probe complexity of local computation algorithms. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.07734.
  20. Manuela Fischer and Mohsen Ghaffari. A Simple Parallel and Distributed Sampling Technique: Local Glauber Dynamics. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  21. Santo Fortunato. Community detection in graphs. Physics reports, 486(3):75-174, 2010. Google Scholar
  22. Alan Frieze and Eric Vigoda. A survey on the use of Markov chains to randomly sample colourings. Oxford Lecture Series in Mathematics and its Applications, 34:53, 2007. Google Scholar
  23. Anna C Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, Sivaramakrishnan Muthukrishnan, and Martin J Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 389-398. ACM, 2002. Google Scholar
  24. O Goldreich, S Goldwasser, and A Nussboim. On the implementation of huge random objects. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 68-79. IEEE, 2003. Google Scholar
  25. Oded Goldreich, Shafi Goldwasser, and Silvio Micali. How to construct random functions. J. ACM, 33(4):792-807, 1986. URL: https://doi.org/10.1145/6490.6503.
  26. Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of huge random objects. SIAM Journal on Computing, 39(7):2761-2822, 2010. Google Scholar
  27. Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653-750, 1998. Google Scholar
  28. Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. In Proceedings of the twenty-ninth annual ACM Symposium on Theory of Computing, pages 406-415. ACM, 1997. Google Scholar
  29. Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109-137, 1983. Google Scholar
  30. Daxin Jiang, Chun Tang, and Aidong Zhang. Cluster analysis for gene expression data: a survey. IEEE Transactions on knowledge and data engineering, 16(11):1370-1386, 2004. Google Scholar
  31. Jon Kleinberg. The small-world phenomenon: An algorithmic perspective. In Proceedings of the thirty-second annual ACM Symposium on Theory of Computing, pages 163-170. ACM, 2000. Google Scholar
  32. Donald E Knuth. The art of computer programming, 3rd edn. seminumerical algorithms, vol. 2, 1997. Google Scholar
  33. Greg Linden, Brent Smith, and Jeremy York. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1):76-80, 2003. Google Scholar
  34. Michael Luby and Charles Rackoff. How to Construct Pseudorandom Permutations from Pseudorandom Functions. SIAM J. Comput., 17(2):373-386, 1988. URL: https://doi.org/10.1137/0217022.
  35. Yishay Mansour, Aviad Rubinstein, Shai Vardi, and Ning Xie. Converting Online Algorithms to Local Computation Algorithms. In Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, pages 653-664, 2012. URL: https://doi.org/10.1007/978-3-642-31594-7_55.
  36. Edward M Marcotte, Matteo Pellegrini, Ho-Leung Ng, Danny W Rice, Todd O Yeates, and David Eisenberg. Detecting protein function and protein-protein interactions from genome sequences. Science, 285(5428):751-753, 1999. Google Scholar
  37. Chip Martel and Van Nguyen. Analyzing Kleinberg’s (and other) small-world models. In Proceedings of the twenty-third annual ACM Symposium on Principles of Distributed Computing, pages 179-188. ACM, 2004. Google Scholar
  38. Joel Miller and Aric Hagberg. Efficient generation of networks with given expected degrees. Algorithms and models for the web graph, pages 115-126, 2011. Google Scholar
  39. Ron Milo, Nadav Kashtan, Shalev Itzkovitz, Mark EJ Newman, and Uri Alon. On the uniform generation of random graphs with prescribed degree sequences. arXiv preprint, 2003. URL: http://arxiv.org/abs/cond-mat/0312028.
  40. Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3-4):431-461, 2015. Google Scholar
  41. Moni Naor and Asaf Nussboim. Implementing huge sparse random graphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 596-608. Springer, 2007. Google Scholar
  42. Moni Naor and Omer Reingold. On the Construction of Pseudo-Random Permutations: Luby-Rackoff Revisited. IACR Cryptology ePrint Archive, 1996:11, 1996. URL: http://eprint.iacr.org/1996/011.
  43. Mark EJ Newman. Models of the small world. Journal of Statistical Physics, 101(3):819-841, 2000. Google Scholar
  44. Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of social networks. Proceedings of the National Academy of Sciences, 99(suppl 1):2566-2572, 2002. Google Scholar
  45. Sadegh Nobari, Xuesong Lu, Panagiotis Karras, and Stéphane Bressan. Fast random graph generation. In Proceedings of the 14th International Conference on Extending Database Technology, pages 331-342. ACM, 2011. Google Scholar
  46. Michal Parnas and Dana Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science, 381(1-3):183-196, 2007. Google Scholar
  47. Shlomi Reuveni. CATALAN'S TRAPEZOIDS. Probability in the Engineering and Informational Sciences, 28(03):353-361, 2014. Google Scholar
  48. Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. arXiv preprint, 2011. URL: http://arxiv.org/abs/1104.1377.
  49. Shaghayegh Sahebi and William W Cohen. Community-based recommendations: a solution to the cold start problem. In Workshop on recommender systems and the social web, RSWEB, page 60, 2011. Google Scholar
  50. Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888-905, 2000. Google Scholar
  51. Therese Sørlie, Charles M Perou, Robert Tibshirani, Turid Aas, Stephanie Geisler, Hilde Johnsen, Trevor Hastie, Michael B Eisen, Matt Van De Rijn, Stefanie S Jeffrey, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences, 98(19):10869-10874, 2001. Google Scholar
  52. Joel Spencer. Asymptopia, volume 71. American Mathematical Soc., 2014. Google Scholar
  53. Richard P Stanley. Catalan numbers. Cambridge University Press, 2015. Google Scholar
  54. Jeffrey Travers and Stanley Milgram. The small world problem. Phychology Today, 1:61-67, 1967. Google Scholar
  55. Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440-442, 1998. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail