A Sublinear Local Access Implementation for the Chinese Restaurant Process

Authors Peter Mörters , Christian Sohler , Stefan Walzer



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2022.28.pdf
  • Filesize: 0.8 MB
  • 18 pages

Document Identifiers

Author Details

Peter Mörters
  • Universität zu Köln, Germany
Christian Sohler
  • Universität zu Köln, Germany
Stefan Walzer
  • Universität zu Köln, Germany

Cite As Get BibTex

Peter Mörters, Christian Sohler, and Stefan Walzer. A Sublinear Local Access Implementation for the Chinese Restaurant Process. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 28:1-28:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.28

Abstract

The Chinese restaurant process is a stochastic process closely related to the Dirichlet process that groups sequentially arriving objects into a variable number of classes, such that within each class objects are cyclically ordered. A popular description involves a restaurant, where customers arrive one by one and either sit down next to a randomly chosen customer at one of the existing tables or open a new table. The full state of the process after n steps is given by a permutation of the n objects and cannot be represented in sublinear space. In particular, if we only need specific information about a few objects or classes it would be preferable to obtain the answers without simulating the process completely.
A recent line of research [Oded Goldreich et al., 2010; Moni Naor and Asaf Nussboim, 2007; Amartya Shankha Biswas et al., 2020; Guy Even et al., 2021] attempts to provide access to huge random objects without fully instantiating them. Such local access implementations provide answers to a sequence of queries about the random object, following the same distribution as if the object was fully generated. In this paper, we provide a local access implementation for a generalization of the Chinese restaurant process described above. Our implementation can be used to answer any sequence of adaptive queries about class affiliation of objects, number and sizes of classes at any time, position of elements within a class, or founding time of a class. The running time per query is polylogarithmic in the total size of the object, with high probability. Our approach relies on some ideas from the recent local access implementation for preferential attachment trees by Even et al. [Guy Even et al., 2021]. Such trees are related to the Chinese restaurant process in the sense that both involve a "rich-get-richer" phenomenon. A novel ingredient in our implementation is to embed the process in continuous time, in which the evolution of the different classes becomes stochastically independent [Joyce and Tavaré, 1987]. This independence is used to keep the probabilistic structure manageable even if many queries have already been answered. As similar embeddings are available for a wide range of urn processes [Krishna B. Athreya and Samuel Karlin, 1968], we believe that our approach may be applicable more generally. Moreover, local access implementations for birth and death processes that we encounter along the way may be of independent interest.

Subject Classification

ACM Subject Classification
  • Theory of computation → Generating random combinatorial structures
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Chinese restaurant process
  • Dirichlet process
  • sublinear time algorithm
  • random recursive tree
  • random permutation
  • random partition
  • Ewens distribution
  • simulation
  • local access implementation
  • continuous time embedding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-efficient local computation algorithms. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1132-1139. SIAM, 2012. Google Scholar
  2. Krishna B. Athreya and Samuel Karlin. Embedding of urn schemes into continuous time markov branching processes and related limit theorems. The Annals of Mathematical Statistics, 39(6):1801-1817, 1968. URL: http://www.jstor.org/stable/2239282.
  3. Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509-512, 1999. Google Scholar
  4. Amartya Shankha Biswas, Edward Pyne, and Ronitt Rubinfeld. Local access to random walks. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference, ITCS, volume 215 of LIPIcs, pages 24:1-24:22, 2022. Google Scholar
  5. Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Local access to huge random objects through partial sampling. In 11th ITCS, volume 151 of LIPIcs, pages 27:1-27:65. Dagstuhl Publishing, 2020. URL: https://doi.org/10.4230/LIPIcs.ITCS.2020.27.
  6. Jakob Björnberg, Cécile Mailler, Peter Mörters, and Daniel Ueltschi. The weighted chinese restaurant process condenses in at most two tables. In progress., 2022. Google Scholar
  7. Karl Bringmann, Fabian Kuhn, Konstantinos Panagiotou, Ueli Peter, and Henning Thomas. Internal DLA: efficient simulation of a physical growth model - (extended abstract). In 41st ICALP, volume 8572 of LNCS, pages 247-258. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_21.
  8. Harry Crane. The Ubiquitous Ewens Sampling Formula. Statistical Science, 31(1):1-19, 2016. URL: https://doi.org/10.1214/15-STS529.
  9. Michael Drmota. Random Trees: An Interplay between Combinatorics and Probability. Springer, 1st edition, 2009. URL: https://doi.org/10.1007/978-3-211-75357-6.
  10. Richard Durrett. Probability models for DNA sequence evolution. Springer, 2008. URL: http://www.math.duke.edu/~rtd/Gbook/PM4DNA_0317.pdf.
  11. Pál Erdős and Alfréd Rényi. On random graphs. I. Publ. Math., 6:290-297, 1959. Google Scholar
  12. Guy Even, Reut Levi, Moti Medina, and Adi Rosén. Sublinear random access generators for preferential attachment graphs. ACM Trans. Algorithms, 17(4):28:1-28:26, 2021. URL: https://doi.org/10.1145/3464958.
  13. Grzegorz Gluch, Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, and Christian Sohler. Spectral clustering oracles in sublinear time. In Proceedings of the 32nd ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1598-1617. SIAM, 2021. Google Scholar
  14. William Goh and Eric Schmutz. Limit distribution for the maximum degree of a random recursive tree. Journal of Computational and Applied Mathematics, 142(1):61-82, 2002. Probabilistic Methods in Combinatorics and Combinatorial Optimization. URL: https://doi.org/10.1016/S0377-0427(01)00460-5.
  15. Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of huge random objects. SIAM J. Comput., 39(7):2761-2822, 2010. URL: https://doi.org/10.1137/080722771.
  16. Christoph Grunau, Slobodan Mitrovic, Ronitt Rubinfeld, and Ali Vakilian. Improved local computation algorithm for set cover via sparsification. In Shuchi Chawla, editor, Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2993-3011. SIAM, 2020. Google Scholar
  17. Avinatan Hassidim, Jonathan A. Kelner, Huy N. Nguyen, and Krzysztof Onak. Local graph partitions for approximation and testing. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 22-31, 2009. Google Scholar
  18. Avinatan Hassidim, Yishay Mansour, and Shai Vardi. Local computation mechanism design. ACM Trans. Economics and Comput., 4(4):21:1-21:24, 2016. Google Scholar
  19. Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109-137, 1983. Google Scholar
  20. Svante Janson. Functional limit theorems for multitype branching processes and generalized pólya urns. Stochastic Processes and their Applications, 110(2):177-245, 2004. URL: https://doi.org/10.1016/j.spa.2003.12.002.
  21. Paul Joyce and Simon Tavaré. Cycles, permutations and the structure of the yule process with immigration. Stochastic Processes and their Applications, 25:309-314, 1987. URL: https://ideas.repec.org/a/eee/spapps/v25y1987ip309-314.html.
  22. Akash Kumar, C. Seshadhri, and Andrew Stolman. Random walks and forbidden minors III: poly(d/ε)-time partition oracles for minor-free graph classes. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 257-268. IEEE, 2021. Google Scholar
  23. Reut Levi and Dana Ron. A quasi-polynomial time partition oracle for graphs with an excluded minor. ACM Trans. Algorithms, 11(3):24:1-24:13, 2015. Google Scholar
  24. Reut Levi, Dana Ron, and Ronitt Rubinfeld. Local algorithms for sparse spanning graphs. Algorithmica, 82(4):747-786, 2020. Google Scholar
  25. Meng Li and Subhashis Ghosal. Bayesian Multiscale Smoothing of Gaussian Noised Images. Bayesian Analysis, 9(3):733-758, 2014. URL: https://doi.org/10.1214/14-BA871.
  26. Yishay Mansour and Shai Vardi. A local computation approximation scheme to maximum matching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 16th International Workshop, APPROX, and 17th International Workshop, RANDOM, volume 8096 of Lecture Notes in Computer Science, pages 260-273. Springer, 2013. Google Scholar
  27. M. Medvedovic and S. Sivaganesan. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics, 18:1194-1206, 2002. Google Scholar
  28. Moni Naor and Asaf Nussboim. Implementing huge sparse random graphs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 10th International Workshop, APPROX, and 11th International Workshop, RANDOM, volume 4627 of Lecture Notes in Computer Science, pages 596-608, 2007. Google Scholar
  29. Zhaohui S. Qin. Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics, 22(16):1988-1997, 2006. URL: https://doi.org/10.1093/bioinformatics/btl284.
  30. Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. In Bernard Chazelle, editor, Innovations in Computer Science - ICS 2011, Tsinghua University, Beijing, China, January 7-9, 2011. Proceedings, pages 223-238. Tsinghua University Press, 2011. Google Scholar
  31. Peter Sanders, Kurt Mehlhorn, Martin Dietzfelbinger, and Roman Dementiev. Sequential and Parallel Algorithms and Data Structures - The Basic Toolbox. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-25209-0.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail