Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH scholarly article en Eden, Talya; Houen, Jakob Bæk Tejs; Narayanan, Shyam; Rosenbaum, Will; Tětek, Jakub License: Creative Commons Attribution 4.0 license (CC BY 4.0)
when quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-188872

; ; ; ;

Bias Reduction for Sum Estimation



In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution 𝒫, and that when an element x is sampled, the probability 𝒫(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution 𝒬 that is sufficiently close to 𝒫. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution 𝒬 sufficient, and what is the algorithmic cost of coping with this noise?
In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x_1, …, x_N. This problem is well-studied in the statistical literature in the case 𝒫 = 𝒬, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution 𝒫, values are sampled from a distribution 𝒬 that is pointwise close to 𝒫. That is, there is a parameter γ < 1 such that for all x_i, (1 - γ) 𝒫(i) ≤ 𝒬(i) ≤ (1 + γ) 𝒫(i). For every positive integer k we define an estimator ζ_k for μ = ∑_i x_i whose bias is proportional to γ^k (where our ζ₁ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if 𝒬 is pointwise γ-close to uniform and all x_i ∈ {0, 1}, for any ε > 0, we can estimate μ to within additive error ε N using m = Θ(N^{1-1/k}/ε^{2/k}) samples, where k = ⌈lg ε/lg γ⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

BibTeX - Entry

  author =	{Eden, Talya and Houen, Jakob B{\ae}k Tejs and Narayanan, Shyam and Rosenbaum, Will and T\v{e}tek, Jakub},
  title =	{{Bias Reduction for Sum Estimation}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)},
  pages =	{62:1--62:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-296-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{275},
  editor =	{Megow, Nicole and Smith, Adam},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-188872},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2023.62},
  annote =	{Keywords: bias reduction, sum estimation, sublinear time algorithms, sample complexity}

Keywords: bias reduction, sum estimation, sublinear time algorithms, sample complexity
Seminar: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)
Issue date: 2023
Date of publication: 04.09.2023

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI