License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.27
URN: urn:nbn:de:0030-drops-171497
Go to the corresponding LIPIcs Volume Portal

Chakraborty, Sourav ; Fischer, Eldar ; Ghosh, Arijit ; Mishra, Gopinath ; Sen, Sayantan

Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

LIPIcs-APPROX27.pdf (0.8 MB)


The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted.
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap.
When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using 𝒪(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests.
Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only 𝒪(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.

BibTeX - Entry

  author =	{Chakraborty, Sourav and Fischer, Eldar and Ghosh, Arijit and Mishra, Gopinath and Sen, Sayantan},
  title =	{{Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
  pages =	{27:1--27:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-249-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{245},
  editor =	{Chakrabarti, Amit and Swamy, Chaitanya},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-171497},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2022.27},
  annote =	{Keywords: Distribution Testing, Tolerant Testing, Non-tolerant Testing, Sample Complexity}

Keywords: Distribution Testing, Tolerant Testing, Non-tolerant Testing, Sample Complexity
Collection: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Issue Date: 2022
Date of publication: 15.09.2022

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI