Support Size Estimation: The Power of Conditioning

Chakraborty, Diptarka; Kumar, Gunjan; Meel, Kuldeep S.

doi:10.4230/LIPIcs.MFCS.2023.33

File

Subject Classification

ACM Subject Classification

Theory of computation → Streaming, sublinear and near linear time algorithms

Keywords

Support-size estimation
Distribution testing
Conditional sampling
Lower bound

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

We consider the problem of estimating the support size of a distribution D. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is allowed to query the given distribution conditioned on an arbitrary subset S. The primary contribution of this work is to introduce a new approach to lower bounds for the COND model that relies on using powerful tools from information theory and communication complexity. Our approach allows us to obtain surprisingly strong lower bounds for the COND model and its extensions. - We bridge the longstanding gap between the upper bound O(log log n + 1/ε²) and the lower bound Ω(√{log log n}) for the COND model by providing a nearly matching lower bound. Surprisingly, we show that even if we get to know the actual probabilities along with COND samples, still Ω(log log n + 1/{ε² log (1/ε)}) queries are necessary. - We obtain the first non-trivial lower bound for the COND equipped with an additional oracle that reveals the actual as well as the conditional probabilities of the samples (to the best of our knowledge, this subsumes all of the models previously studied): in particular, we demonstrate that Ω(log log log n + 1/{ε² log (1/ε)}) queries are necessary.

Cite As Get BibTex

Diptarka Chakraborty, Gunjan Kumar, and Kuldeep S. Meel. Support Size Estimation: The Power of Conditioning. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 272, pp. 33:1-33:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.MFCS.2023.33

Author Details

Diptarka Chakraborty

National University of Singapore, Singapore

Gunjan Kumar

National University of Singapore, Singapore

Kuldeep S. Meel

National University of Singapore, Singapore

References

Jayadev Acharya, Clément L Canonne, and Gautam Kamath. A chasm between identity and equivalence testing with conditional queries. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.
Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating the entropy. SIAM Journal on Computing, 35(1):132-150, 2005.
Rishiraj Bhattacharyya and Sourav Chakraborty. Property testing of joint distributions using conditional samples. ACM Transactions on Computation Theory (TOCT), 10(4):1-20, 2018.
David Blackwell and James B MacQueen. Ferguson distributions via pólya urn schemes. The annals of statistics, 1(2):353-355, 1973.
Eric Blais, Joshua Brody, and Kevin Matulef. Property testing lower bounds via communication complexity. computational complexity, 21(2):311-358, 2012.
Eric Blais, Clément L Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. ACM Transactions on Computation Theory (TOCT), 11(2):1-37, 2019.
Cafer Caferov, Barış Kaya, Ryan O’Donnell, and AC Say. Optimal bounds for estimating entropy with pmf queries. In International Symposium on Mathematical Foundations of Computer Science, pages 187-198. Springer, 2015.
Clément Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In International Colloquium on Automata, Languages, and Programming, pages 283-295. Springer, 2014.
Clément L Canonne. A survey on distribution testing: Your data is big. but is it blue? Theory of Computing, pages 1-100, 2020.
Clément L Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 321-336. SIAM, 2021.
Clément L Canonne, Dana Ron, and Rocco A Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 44(3):540-616, 2015.
Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM Journal on Computing, 41(5):1299-1317, 2012.
Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. SIAM Journal on Computing, 45(4):1261-1296, 2016.
Sourav Chakraborty and Kuldeep S Meel. On testing of uniform samplers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pages 7777-7784, 2019.
Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. Learning and testing junta distributions with sub cube conditioning. In Conference on Learning Theory, pages 1060-1113. PMLR, 2021.
Remi Delannoy and Kuldeep S Meel. On almost-uniform generation of sat solutions: The power of 3-wise independent hashing. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, 2022.
Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. Faster algorithms for testing under conditional sampling. In Conference on Learning Theory, pages 607-636. PMLR, 2015.
Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. Estimating the number of defectives with group testing. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1376-1380. IEEE, 2016.
Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation, pages 68-75. Springer, 2011.
Priyanka Golia, Brendan Juba, and Kuldeep S. Meel. Efficient entropy estimation with applications to quantitative information flow. In International Conference on Computer-Aided Verification (CAV), 2022.
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In Future of Software Engineering Proceedings, FOSE 2014, pages 167-181, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2593882.2593900.
Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Sublinear estimation of entropy and information distances. ACM Transactions on Algorithms (TALG), 5(4):1-16, 2009.
Norman L Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Continuous univariate distributions, volume 2, volume 289. John wiley & sons, 1995.
Gautam Kamath and Christos Tzamos. Anaconda: A non-adaptive conditional sampling algorithm for distribution testing. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 679-693. SIAM, 2019.
Kuldeep S Meel, Yash Pralhad Pote, and Sourav Chakraborty. On testing of samplers. Advances in Neural Information Processing Systems, 33:5753-5763, 2020.
Shyam Narayanan. On tolerant distribution testing in the conditional sampling model. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10-13, 2021, pages 357-373. SIAM, 2021.
Krzysztof Onak and Xiaorui Sun. Probability-revealing samples. In International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
Ronitt Rubinfeld and Rocco A Servedio. Testing monotone high-dimensional distributions. Random Structures & Algorithms, 34(1):24-44, 2009.
C. E. Shannon. A mathematical theory of communication. Bell system technical journal, 27, 1948.
Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 685-694, 2011.
Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 47(2):857-883, 2019.

Support Size Estimation: The Power of Conditioning

Authors Diptarka Chakraborty, Gunjan Kumar, Kuldeep S. Meel

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Support Size Estimation: The Power of Conditioning

Authors Diptarka Chakraborty, Gunjan Kumar, Kuldeep S. Meel

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message