Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH scholarly article en Canonne, Clément L.; Wimmer, Karl https://www.dagstuhl.de/lipics License: Creative Commons Attribution 3.0 Unported license (CC-BY 3.0)
when quoting this document, please refer to the following
DOI:
URN: urn:nbn:de:0030-drops-126277
URL:

;

Testing Data Binnings

pdf-format:


Abstract

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) 𝐪 and samples from an unknown distribution 𝐩, both over [n] = {1,2,… ,n}, whether 𝐩 equals 𝐪, or is significantly different from it. In this paper, we introduce the related question of identity up to binning, where the reference distribution 𝐪 is over k ≪ n elements: the question is then whether there exists a suitable binning of the domain [n] into k intervals such that, once "binned," 𝐩 is equal to 𝐪. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne [Clément L. Canonne, 2019]. Finally, we discuss several extensions and related research directions.

BibTeX - Entry

@InProceedings{canonne_et_al:LIPIcs:2020:12627,
  author =	{Cl{\'e}ment L. Canonne and Karl Wimmer},
  title =	{{Testing Data Binnings}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{24:1--24:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Jaros{\l}aw Byrka and Raghu Meka},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12627},
  URN =		{urn:nbn:de:0030-drops-126277},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.24},
  annote =	{Keywords: property testing, distribution testing, identity testing, hypothesis testing}
}

Keywords: property testing, distribution testing, identity testing, hypothesis testing
Seminar: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Issue date: 2020
Date of publication: 11.08.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI