Local Access to Huge Random Objects Through Partial Sampling
Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "on-the-fly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency.
Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space.
Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language).
Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself.
sublinear time algorithms
random generation
local computation
Theory of computation~Generating random combinatorial structures
Theory of computation~Streaming, sublinear and near linear time algorithms
27:1-27:65
Regular Paper
Amartya Shankha
Biswas
Amartya Shankha Biswas
CSAIL, MIT, Cambridge, MA, USA
https://orcid.org/0000-0002-5068-1524
MIT Presidential Fellowship
Ronitt
Rubinfeld
Ronitt Rubinfeld
CSAIL, MIT, Cambridge, MA, USA
NSF grants CCF-1650733, CCF-1733808, IIS-1741137 and CCF-1740751
Anak
Yodpinyanee
Anak Yodpinyanee
CSAIL, MIT, Cambridge, MA, USA
https://orcid.org/0000-0002-7572-2003
NSF grants CCF-1650733, CCF-1733808, IIS-1741137 and DPST scholarship, Royal Thai Government
10.4230/LIPIcs.ITCS.2020.27
Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446-6531, 2017.
Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471-487, 2016.
Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 670-688. IEEE, 2015.
Maksudul Alam and Maleq Khan. Parallel algorithms for generating random networks with given degree sequences. International Journal of Parallel Programming, 45(1):109-127, 2017.
Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-efficient local computation algorithms. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1132-1139. Society for Industrial and Applied Mathematics, 2012.
Danielle Smith Bassett and ED Bullmore. Small-world brain networks. The neuroscientist, 12(6):512-523, 2006.
Vladimir Batagelj and Ulrik Brandes. Efficient generation of large random networks. Physical Review E, 71(3):036113, 2005.
Russ Bubley and Martin Dyer. Path coupling: A technique for proving rapid mixing in Markov chains. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 223-231. IEEE, 1997.
Irineo Cabreros, Emmanuel Abbe, and Aristotelis Tsirigos. Detecting community structures in Hi-C genomic data. In Information Science and Systems (CISS), 2016 Annual Conference on, pages 584-589. IEEE, 2016.
Yi-Jun Chang, Manuela Fischer, Mohsen Ghaffari, Jara Uitto, and Yufan Zheng. The Complexity of (Δ+ 1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, pages 471-480. ACM, 2019.
Jingchun Chen and Bo Yuan. Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics, 22(18):2283-2290, 2006.
Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. In Conference on Learning Theory, pages 391-423, 2015.
Melissa S Cline, Michael Smoot, Ethan Cerami, Allan Kuchinsky, Nerius Landys, Chris Workman, Rowan Christmas, Iliana Avila-Campilo, Michael Creech, Benjamin Gross, et al. Integration of biological networks and gene expression data using Cytoscape. Nature protocols, 2(10):2366-2382, 2007.
Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. Computational geometry. In Computational geometry, pages 105-109. Springer, 2000.
Peter Sheridan Dodds, Roby Muhamad, and Duncan J Watts. An experimental study of search in global social networks. science, 301(5634):827-829, 2003.
David Easley and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010.
Paul Erdos and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17-60, 1960.
Guy Even, Reut Levi, Moti Medina, and Adi Rosén. Sublinear Random Access Generators for Preferential Attachment Graphs. In 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, pages 6:1-6:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.6.
https://doi.org/10.4230/LIPIcs.ICALP.2017.6
Uriel Feige, Boaz Patt-Shamir, and Shai Vardi. On the probe complexity of local computation algorithms. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.07734.
http://arxiv.org/abs/1703.07734
Manuela Fischer and Mohsen Ghaffari. A Simple Parallel and Distributed Sampling Technique: Local Glauber Dynamics. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
Santo Fortunato. Community detection in graphs. Physics reports, 486(3):75-174, 2010.
Alan Frieze and Eric Vigoda. A survey on the use of Markov chains to randomly sample colourings. Oxford Lecture Series in Mathematics and its Applications, 34:53, 2007.
Anna C Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, Sivaramakrishnan Muthukrishnan, and Martin J Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 389-398. ACM, 2002.
O Goldreich, S Goldwasser, and A Nussboim. On the implementation of huge random objects. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 68-79. IEEE, 2003.
Oded Goldreich, Shafi Goldwasser, and Silvio Micali. How to construct random functions. J. ACM, 33(4):792-807, 1986. URL: https://doi.org/10.1145/6490.6503.
https://doi.org/10.1145/6490.6503
Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of huge random objects. SIAM Journal on Computing, 39(7):2761-2822, 2010.
Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653-750, 1998.
Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. In Proceedings of the twenty-ninth annual ACM Symposium on Theory of Computing, pages 406-415. ACM, 1997.
Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109-137, 1983.
Daxin Jiang, Chun Tang, and Aidong Zhang. Cluster analysis for gene expression data: a survey. IEEE Transactions on knowledge and data engineering, 16(11):1370-1386, 2004.
Jon Kleinberg. The small-world phenomenon: An algorithmic perspective. In Proceedings of the thirty-second annual ACM Symposium on Theory of Computing, pages 163-170. ACM, 2000.
Donald E Knuth. The art of computer programming, 3rd edn. seminumerical algorithms, vol. 2, 1997.
Greg Linden, Brent Smith, and Jeremy York. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1):76-80, 2003.
Michael Luby and Charles Rackoff. How to Construct Pseudorandom Permutations from Pseudorandom Functions. SIAM J. Comput., 17(2):373-386, 1988. URL: https://doi.org/10.1137/0217022.
https://doi.org/10.1137/0217022
Yishay Mansour, Aviad Rubinstein, Shai Vardi, and Ning Xie. Converting Online Algorithms to Local Computation Algorithms. In Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, pages 653-664, 2012. URL: https://doi.org/10.1007/978-3-642-31594-7_55.
https://doi.org/10.1007/978-3-642-31594-7_55
Edward M Marcotte, Matteo Pellegrini, Ho-Leung Ng, Danny W Rice, Todd O Yeates, and David Eisenberg. Detecting protein function and protein-protein interactions from genome sequences. Science, 285(5428):751-753, 1999.
Chip Martel and Van Nguyen. Analyzing Kleinberg’s (and other) small-world models. In Proceedings of the twenty-third annual ACM Symposium on Principles of Distributed Computing, pages 179-188. ACM, 2004.
Joel Miller and Aric Hagberg. Efficient generation of networks with given expected degrees. Algorithms and models for the web graph, pages 115-126, 2011.
Ron Milo, Nadav Kashtan, Shalev Itzkovitz, Mark EJ Newman, and Uri Alon. On the uniform generation of random graphs with prescribed degree sequences. arXiv preprint, 2003. URL: http://arxiv.org/abs/cond-mat/0312028.
http://arxiv.org/abs/cond-mat/0312028
Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3-4):431-461, 2015.
Moni Naor and Asaf Nussboim. Implementing huge sparse random graphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 596-608. Springer, 2007.
Moni Naor and Omer Reingold. On the Construction of Pseudo-Random Permutations: Luby-Rackoff Revisited. IACR Cryptology ePrint Archive, 1996:11, 1996. URL: http://eprint.iacr.org/1996/011.
http://eprint.iacr.org/1996/011
Mark EJ Newman. Models of the small world. Journal of Statistical Physics, 101(3):819-841, 2000.
Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of social networks. Proceedings of the National Academy of Sciences, 99(suppl 1):2566-2572, 2002.
Sadegh Nobari, Xuesong Lu, Panagiotis Karras, and Stéphane Bressan. Fast random graph generation. In Proceedings of the 14th International Conference on Extending Database Technology, pages 331-342. ACM, 2011.
Michal Parnas and Dana Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science, 381(1-3):183-196, 2007.
Shlomi Reuveni. CATALAN'S TRAPEZOIDS. Probability in the Engineering and Informational Sciences, 28(03):353-361, 2014.
Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. arXiv preprint, 2011. URL: http://arxiv.org/abs/1104.1377.
http://arxiv.org/abs/1104.1377
Shaghayegh Sahebi and William W Cohen. Community-based recommendations: a solution to the cold start problem. In Workshop on recommender systems and the social web, RSWEB, page 60, 2011.
Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888-905, 2000.
Therese Sørlie, Charles M Perou, Robert Tibshirani, Turid Aas, Stephanie Geisler, Hilde Johnsen, Trevor Hastie, Michael B Eisen, Matt Van De Rijn, Stefanie S Jeffrey, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences, 98(19):10869-10874, 2001.
Joel Spencer. Asymptopia, volume 71. American Mathematical Soc., 2014.
Richard P Stanley. Catalan numbers. Cambridge University Press, 2015.
Jeffrey Travers and Stanley Milgram. The small world problem. Phychology Today, 1:61-67, 1967.
Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440-442, 1998.
Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode