Abstract
The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted.
This work focuses on the connection between the sample complexities of nontolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to labelinvariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring polylogarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almostquadratic gap.
When moving to general, not necessarily labelinvariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be nonconcentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be nontolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using 𝒪(n) many samples. Being nonconcentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests.
Apart from the case where the distribution is nonconcentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only 𝒪(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.
BibTeX  Entry
@InProceedings{chakraborty_et_al:LIPIcs.APPROX/RANDOM.2022.27,
author = {Chakraborty, Sourav and Fischer, Eldar and Ghosh, Arijit and Mishra, Gopinath and Sen, Sayantan},
title = {{Exploring the Gap Between Tolerant and NonTolerant Distribution Testing}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
pages = {27:127:23},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772495},
ISSN = {18688969},
year = {2022},
volume = {245},
editor = {Chakrabarti, Amit and Swamy, Chaitanya},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/17149},
URN = {urn:nbn:de:0030drops171497},
doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.27},
annote = {Keywords: Distribution Testing, Tolerant Testing, Nontolerant Testing, Sample Complexity}
}
Keywords: 

Distribution Testing, Tolerant Testing, Nontolerant Testing, Sample Complexity 
Collection: 

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022) 
Issue Date: 

2022 
Date of publication: 

15.09.2022 