DROPS

Document

DOI: 10.4230/LIPIcs.ICALP.2018.41

Sample-Optimal Identity Testing with High Probability

Authors: Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.

Cite as

Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-Optimal Identity Testing with High Probability. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 41:1-41:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{diakonikolas_et_al:LIPIcs.ICALP.2018.41,
  author =	{Diakonikolas, Ilias and Gouleakis, Themis and Peebles, John and Price, Eric},
  title =	{{Sample-Optimal Identity Testing with High Probability}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{41:1--41:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.41},
  URN =		{urn:nbn:de:0030-drops-90459},
  doi =		{10.4230/LIPIcs.ICALP.2018.41},
  annote =	{Keywords: distribution testing, property testing, sample complexity}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2017.8

Near-Optimal Closeness Testing of Discrete Histogram Distributions

Authors: Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Abstract

We investigate the problem of testing the equivalence between two discrete histograms. A k-histogram over [n] is a probability distribution that is piecewise constant over some set of k intervals over [n]. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two k-histogram distributions p, q over [n], we want to distinguish (with high probability) between the cases that p = q and ||p ? q||_1 >= epsilon. The main contribution of this paper is a new algorithm for this testing problem and a nearly matching information-theoretic lower bound. Specifically, the sample complexity of our algorithm matches our lower bound up to a logarithmic factor, improving on previous work by polynomial factors in the relevant parameters. Our algorithmic approach applies in a more general setting and yields improved sample upper bounds for testing closeness of other structured distributions as well.

Cite as

Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Near-Optimal Closeness Testing of Discrete Histogram Distributions. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 8:1-8:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{diakonikolas_et_al:LIPIcs.ICALP.2017.8,
  author =	{Diakonikolas, Ilias and Kane, Daniel M. and Nikishkin, Vladimir},
  title =	{{Near-Optimal Closeness Testing of Discrete Histogram Distributions}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{8:1--8:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.8},
  URN =		{urn:nbn:de:0030-drops-74937},
  doi =		{10.4230/LIPIcs.ICALP.2017.8},
  annote =	{Keywords: distribution testing, histograms, closeness testing}
}

Document

DOI: 10.4230/LIPIcs.STACS.2016.25

Testing Shape Restrictions of Discrete Distributions

Authors: Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Abstract

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D in P and l_{1}(D,P)>epsilon. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.

Cite as

Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{canonne_et_al:LIPIcs.STACS.2016.25,
  author =	{Canonne, Cl\'{e}ment L. and Diakonikolas, Ilias and Gouleakis, Themis and Rubinfeld, Ronitt},
  title =	{{Testing Shape Restrictions of Discrete Distributions}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{25:1--25:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.25},
  URN =		{urn:nbn:de:0030-drops-57260},
  doi =		{10.4230/LIPIcs.STACS.2016.25},
  annote =	{Keywords: property testing, probability distributions, statistics, lower bounds}
}

Search Results

Documents authored by Diakonikolas, Ilias

Sample-Optimal Identity Testing with High Probability

Abstract

Cite as

Near-Optimal Closeness Testing of Discrete Histogram Distributions

Abstract

Cite as

Testing Shape Restrictions of Discrete Distributions

Abstract

Cite as