Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Chakraborty, Sourav; Fischer, Eldar; Ghosh, Arijit; Mishra, Gopinath; Sen, Sayantan

doi:10.4230/LIPIcs.APPROX/RANDOM.2022.27

Subject Classification

ACM Subject Classification

Theory of computation → Streaming, sublinear and near linear time algorithms

Keywords

Distribution Testing
Tolerant Testing
Non-tolerant Testing
Sample Complexity

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using 𝒪(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests. Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only 𝒪(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.

Cite As Get BibTex

Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen. Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 27:1-27:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.27

Author Details

Sourav Chakraborty

Indian Statistical Institute, Kolkata, India

Eldar Fischer

Technion - Israel Institute of Technology, Haifa, Israel

Arijit Ghosh

Indian Statistical Institute, Kolkata, India

Gopinath Mishra

University of Warwick, Coventry, UK

Sayantan Sen

Indian Statistical Institute, Kolkata, India

References

Jayadev Acharya, Clément L. Canonne, Cody Freitag, Ziteng Sun, and Himanshu Tyagi. Inference under information constraints III: local privacy constraints. IEEE J. Sel. Areas Inf. Theory, pages 253-267, 2021. URL: https://doi.org/10.1109/JSAIT.2021.3053569.
Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. Optimal testing for properties of distributions. In NIPS, pages 3591-3599, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/1f36c15d6a3d18d52e8d493bc8187cb9-Abstract.html.
Jayadev Acharya, Ilias Diakonikolas, Jerry Li, and Ludwig Schmidt. Sample-optimal density estimation in nearly-linear time. In SODA, pages 1278-1289, 2017. URL: https://doi.org/10.1137/1.9781611974782.83.
Jayadev Acharya, Peter Kairouz, Yuhan Liu, and Ziteng Sun. Estimating sparse discrete distributions under privacy and communication constraints. In ALT, pages 79-98, 2021. URL: http://proceedings.mlr.press/v132/acharya21b.html.
Maryam Aliakbarpour, Ilias Diakonikolas, Daniel Kane, and Ronitt Rubinfeld. Private testing of distributions via sample permutations. In NeurIPS, pages 10877-10888, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/8e036cc193d0af59aa9b22821248292b-Abstract.html.
Noga Alon, Eric Blais, Sourav Chakraborty, David García-Soriano, and Arie Matsliah. Nearly tight bounds for testing function isomorphism. SIAM J. Comput., 42(2):459-493, 2013. URL: https://doi.org/10.1137/110832677.
Tugkan Batu and Clément L. Canonne. Generalized uniformity testing. In FOCS, pages 880-889, 2017. URL: https://doi.org/10.1109/FOCS.2017.86.
Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In FOCS, pages 442-451, 2001. URL: https://doi.org/10.1109/SFCS.2001.959920.
Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In FOCS, pages 259-269, 2000. URL: https://doi.org/10.1109/SFCS.2000.892113.
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Mach. Learn., pages 151-175, 2010. URL: https://doi.org/10.1007/s10994-009-5152-4.
Dimitris Bertsimas and John N Tsitsiklis. Introduction to linear optimization, volume 6. Athena Scientific Belmont, MA, 1997.
Eric Blais, Clément L. Canonne, Talya Eden, Amit Levi, and Dana Ron. Tolerant junta testing and the connection to submodular optimization and function isomorphism. ACM Trans. Comput. Theory, pages 24:1-24:33, 2019. URL: https://doi.org/10.1145/3337789.
Clément L Canonne. A short note on learning discrete distributions. https://github.com/ccanonne/probabilitydistributiontoolbox/blob/master/learning.pdf, 2020.
Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing shape restrictions of discrete distributions. Theory Comput. Syst., pages 4-62, 2018. URL: https://doi.org/10.1007/s00224-017-9785-6.
Clément L. Canonne, Ayush Jain, Gautam Kamath, and Jerry Li. The price of tolerance in distribution testing. In COLT, pages 573-624, 2022. URL: https://proceedings.mlr.press/v178/canonne22a.html.
Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen. Exploring the gap between tolerant and non-tolerant distribution testing. CoRR, 2021. URL: http://arxiv.org/abs/2110.09972.
Wei-Ning Chen, Peter Kairouz, and Ayfer Özgür. Breaking the communication-privacy-accuracy trilemma. In NeurIPS, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/222afbe0d68c61de60374b96f1d86715-Abstract.html.
Gregory W Corder and Dale I Foreman. Nonparametric statistics: A step-by-step approach. John Wiley & Sons, 2014.
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley, 2001. URL: https://doi.org/10.1002/0471200611.
Constantinos Daskalakis, Gautam Kamath, and John Wright. Which distribution distances are sublinearly testable? In SODA, pages 2747-2764, 2018. URL: https://doi.org/10.1137/1.9781611975031.175.
Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In FOCS, pages 685-694, 2016. URL: https://doi.org/10.1109/FOCS.2016.78.
Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In FOCS, pages 73-84, 2017. URL: https://doi.org/10.1109/FOCS.2017.16.
Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Sharp bounds for generalized uniformity testing. In NeurIPS, pages 6204-6213, 2018. URL: https://proceedings.neurips.cc/paper/2018/hash/fc325d4b598aaede18b53dca4ecfcb9c-Abstract.html.
Eldar Fischer, Oded Lachish, and Yadu Vasudev. Trading query complexity for sample-based testing and multi-testing scalability. In FOCS, pages 1163-1182, 2015. URL: https://doi.org/10.1109/FOCS.2015.75.
Eldar Fischer, Oded Lachish, and Yadu Vasudev. Improving and extending the testing of distributions for shape-restricted properties. In STACS, pages 31:1-31:14, 2017. URL: https://doi.org/10.4230/LIPIcs.STACS.2017.31.
Eldar Fischer and Arie Matsliah. Testing graph isomorphism. SIAM J. Comput., pages 207-225, 2008. URL: https://doi.org/10.1137/070680795.
Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. URL: https://doi.org/10.1017/9781108135252.
Oded Goldreich. Testing isomorphism in the bounded-degree graph model. Electron. Colloquium Comput. Complex., page 102, 2019. URL: https://eccc.weizmann.ac.il/report/2019/102.
Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electron. Colloquium Comput. Complex., 7(20), 2000. URL: https://eccc.weizmann.ac.il/eccc-reports/2000/TR00-020/index.html.
Oded Goldreich and Dana Ron. On sample-based testers. ACM Trans. Comput. Theory, pages 7:1-7:54, 2016. URL: https://doi.org/10.1145/2898355.
Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, and Huanyu Zhang. Locally private hypothesis selection. In COLT, pages 1785-1816, 2020. URL: http://proceedings.mlr.press/v125/gopi20a.html.
Terry King. A guide to chi-squared testing. Taylor & Francis, 1997.
David J. C. MacKay. Information theory, inference, and learning algorithms. Cambridge University Press, 2003.
Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inf. Theory, pages 4750-4755, 2008. URL: https://doi.org/10.1109/TIT.2008.928987.
Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. J. Comput. Syst. Sci., pages 1012-1042, 2006. URL: https://doi.org/10.1016/j.jcss.2006.03.002.
Gregory Valiant and Paul Valiant. The power of linear estimators. In FOCS, pages 403-412, 2011. URL: https://doi.org/10.1109/FOCS.2011.81.
Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM J. Comput., pages 429-455, 2017. URL: https://doi.org/10.1137/151002526.
Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. J. ACM, pages 37:1-37:41, 2017. URL: https://doi.org/10.1145/3125643.
Paul Valiant. Testing symmetric properties of distributions. SIAM J. Comput., pages 1927-1968, 2011. URL: https://doi.org/10.1137/080734066.
Huanyu Zhang. Statistical inference in the differential privacy model. CoRR, abs/2108.05000, 2021. URL: http://arxiv.org/abs/2108.05000.

Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Authors Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Authors Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message