Settling the Complexity of Testing Grainedness of Distributions, and Application to Uniformity Testing in the Huge Object Model

Authors Clément L. Canonne , Sayantan Sen , Joy Qiping Yang



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2025.26.pdf
  • Filesize: 0.82 MB
  • 19 pages

Document Identifiers

Author Details

Clément L. Canonne
  • School of Computer Science, University of Sydney, Australia
Sayantan Sen
  • Centre for Quantum Technologies, National University of Singapore, Singapore
Joy Qiping Yang
  • School of Computer Science, University of Sydney, Australia

Acknowledgements

We would like to thank the anonymous reviewers of ITCS 2025 for their suggestions which improved the presentation of the paper. SS would like to thank Clément Canonne and the Theory CS group at the University of Sydney for the warm hospitality during his academic visit, where this work was initiated.

Cite As Get BibTex

Clément L. Canonne, Sayantan Sen, and Joy Qiping Yang. Settling the Complexity of Testing Grainedness of Distributions, and Application to Uniformity Testing in the Huge Object Model. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 26:1-26:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ITCS.2025.26

Abstract

In this work, we study the problem of testing m-grainedness of probability distributions over an n-element universe 𝒰, or, equivalently, of whether a probability distribution is induced by a multiset S ⊆ 𝒰 of size |S| = m. Recently, Goldreich and Ron (Computational Complexity, 2023) proved that Ω(n^c) samples are necessary for testing this property, for any c < 1 and m = Θ(n). They also conjectured that Ω(m/(log m)) samples are necessary for testing this property when m = Θ(n). In this work, we positively settle this conjecture.
Using a known connection to the Distribution over Huge objects (DoHo) model introduced by Goldreich and Ron (TheoretiCS, 2023), we leverage our results to provide improved bounds for uniformity testing in the DoHo model.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Distribution testing
  • Uniformity testing
  • Huge Object Model
  • Lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. Advances in Neural Information Processing Systems (NeurIPS), 28, 2015. Google Scholar
  2. Tomer Adar and Eldar Fischer. Refining the adaptivity notion in the huge object model. In International Conference on Randomization and Computation (RANDOM), pages 45:1-45:16, 2024. URL: https://doi.org/10.4230/LIPICS.APPROX/RANDOM.2024.45.
  3. Tomer Adar, Eldar Fischer, and Amit Levi. Support testing in the huge object model. In International Conference on Randomization and Computation (RANDOM), pages 46:1-46:16, 2024. URL: https://doi.org/10.4230/LIPICS.APPROX/RANDOM.2024.46.
  4. Sivaraman Balakrishnan and Larry Wasserman. Hypothesis testing for high-dimensional multinomials: A selective review, 2018. Google Scholar
  5. Tugkan Batu and Clément L Canonne. Generalized uniformity testing. In Symposium on Foundations of Computer Science (FOCS), pages 880-889, 2017. URL: https://doi.org/10.1109/FOCS.2017.86.
  6. Tugkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Symposium on Foundations of Computer Science (FOCS), pages 442-451, 2001. URL: https://doi.org/10.1109/SFCS.2001.959920.
  7. Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D Smith, and Patrick White. Testing that distributions are close. In Symposium on Foundations of Computer Science (FOCS), pages 259-269, 2000. URL: https://doi.org/10.1109/SFCS.2000.892113.
  8. Clément L Canonne. A survey on distribution testing: Your data is big. but is it blue? Theory of Computing, pages 1-100, 2020. Google Scholar
  9. Clément L Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trendsregistered in Communications and Information Theory, pages 1032-1198, 2022. URL: https://doi.org/10.1561/0100000114.
  10. Clément L Canonne, Ilias Diakonikolas, Daniel Kane, and Sihan Liu. Nearly-tight bounds for testing histogram distributions. Advances in Neural Information Processing Systems (NeurIPS), 35:31599-31611, 2022. Google Scholar
  11. Clément L Canonne, Ayush Jain, Gautam Kamath, and Jerry Li. The price of tolerance in distribution testing. In Conference on Learning Theory (COLT), pages 573-624, 2022. URL: https://proceedings.mlr.press/v178/canonne22a.html.
  12. Clément L. Canonne, Sayantan Sen, and Joy Qiping Yang. Settling the complexity of testing grainedness of distributions, and application to uniformity testing in the huge object model. ECCC preprint, 2024. URL: https://eccc.weizmann.ac.il/report/2024/196.
  13. Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen. Testing of index-invariant properties in the huge object model. In Conference on Learning Theory (COLT), pages 3065-3136, 2023. URL: https://proceedings.mlr.press/v195/chakraborty23a.html.
  14. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-optimal identity testing with high probability. In International Colloquium on Automata, Languages, and Programming (ICALP), 2018. Google Scholar
  15. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Collision-based testers are optimal for uniformity and closeness. Chic. J. Theor. Comput. Sci, 25:1-21, 2019. URL: http://cjtcs.cs.uchicago.edu/articles/2019/1/contents.html.
  16. Ilias Diakonikolas and Daniel M Kane. A new approach for testing properties of discrete distributions. In Symposium on Foundations of Computer Science (FOCS), pages 685-694, 2016. URL: https://doi.org/10.1109/FOCS.2016.78.
  17. Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Sharp bounds for generalized uniformity testing. Advances in Neural Information Processing Systems (NeurIPS), 2018. Google Scholar
  18. Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. In Electronic Colloquium on Computational Complexity (ECCC), page 1, 2016. Google Scholar
  19. Oded Goldreich. Introduction to property testing. Cambridge University Press, 2017. URL: https://doi.org/10.1017/9781108135252.
  20. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), pages 653-750, 1998. URL: https://doi.org/10.1145/285055.285060.
  21. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electron. Colloquium Comput. Complex., TR00-020, 2000. URL: https://eccc.weizmann.ac.il/eccc-reports/2000/TR00-020/index.html.
  22. Oded Goldreich and Dana Ron. A lower bound on the complexity of testing grained distributions. Computational Complexity (CC), page 11, 2023. URL: https://doi.org/10.1007/S00037-023-00245-W.
  23. Oded Goldreich and Dana Ron. Testing distributions of huge objects. In TheoretiCS, page 78, 2023. Google Scholar
  24. metamorphy. Proving a formula for n-th degree polynomial with n distinct real roots, 2021. URL: https://math.stackexchange.com/questions/4074098/proving-a-formula-for-sum-j-1n-fracx-jkfx-j-for-f-an-n-th-degr.
  25. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, pages 4750-4755, 2008. URL: https://doi.org/10.1109/TIT.2008.928987.
  26. Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM Journal on Computing (SICOMP), pages 813-842, 2009. URL: https://doi.org/10.1137/070701649.
  27. Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials with applications to program testing. SIAM Journal on Computing (SICOMP), pages 252-271, 1996. URL: https://doi.org/10.1137/S0097539793255151.
  28. Gregory Valiant and Paul Valiant. Estimating the unseen: improved estimators for entropy and other properties. Journal of the ACM (JACM), pages 1-41, 2017. URL: https://doi.org/10.1145/3125643.
  29. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing (SICOMP), pages 1927-1968, 2011. URL: https://doi.org/10.1137/080734066.
  30. Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory, pages 3702-3720, 2016. URL: https://doi.org/10.1109/TIT.2016.2548468.
  31. Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, pages 857-883, 2019. Google Scholar
  32. Yihong Wu, Pengkun Yang, et al. Polynomial methods in statistical inference: theory and practice. Foundations and Trendsregistered in Communications and Information Theory, pages 402-586, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail