Testing Distributions of Huge Objects

Authors Oded Goldreich , Dana Ron



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2022.78.pdf
  • Filesize: 0.83 MB
  • 19 pages

Document Identifiers

Author Details

Oded Goldreich
  • Department of Computer Science, Weizmann Institute of Science, Israel
Dana Ron
  • School of Electrical Engineering, Tel Aviv University, Israel

Acknowledgements

We are grateful to Avi Wigderson for a discussion that started this research project.

Cite AsGet BibTex

Oded Goldreich and Dana Ron. Testing Distributions of Huge Objects. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 78:1-78:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ITCS.2022.78

Abstract

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings). Specifically, the distance between distributions is defined as the earth mover’s distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions, where we obtain the following results. - Testing whether a distribution over n-bit long strings is uniform on some set of size m can be done with query complexity Õ(m/ε³), where ε > (log₂m)/n is the proximity parameter. - Testing whether two distribution over n-bit long strings that have support size at most m are identical can be done with query complexity Õ(m^{2/3}/ε³). Both upper bounds are quite tight; that is, for ε = Ω(1), the first task requires Ω(m^c) queries for any c < 1 and n = ω(log m), whereas the second task requires Ω(m^{2/3}) queries. Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Property Testing
  • Distributions

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tugkan Batu. Testing properties of distributions. PhD thesis, Computer Science department, Cornell University, 2001. Google Scholar
  2. Tugkan Batu and Clement L. Canonne. Generalized uniformity testing. In Proceedings of the Fiftieth-Eighth Annual Symposium on Foundations of Computer Science (FOCS), pages 880-889, 2017. Google Scholar
  3. Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of the Forty-Second Annual Symposium on Foundations of Computer Science (FOCS), pages 442-451, 2001. Google Scholar
  4. Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of the Forty-First Annual Symposium on Foundations of Computer Science (FOCS), pages 259-269, 2000. Google Scholar
  5. Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM, 60(1):4:1-4:25, 2013. This is a long version of [Tugkan Batu et al., 2000]. Google Scholar
  6. Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Number 9 in Graduate Surveys. Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
  7. Ilias Diakonikolas, Daniel Kan, and Alistair Stewart. Sharp bounds for generalized uniformity testing. Technical Report TR17-132, Electronic Colloquium on Computational Complexity (ECCC), 2017. Google Scholar
  8. Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. Google Scholar
  9. Oded Goldreich and Dana Ron. Lower bounds on the complexity of testing grained distributions. Technical Report TR21-129, Electronic Colloquium on Computational Complexity (ECCC), 2021. Google Scholar
  10. Oded Goldreich and Dana Ron. Testing distributions of huge objects. Technical Report TR21-133, Electronic Colloquium on Computational Complexity (ECCC), 2021. Google Scholar
  11. Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bonds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009. Google Scholar
  12. Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the Fourty-Third Annual ACM Symposium on the Theory of Computing (STOC), pages 685-694, 2011. Google Scholar
  13. Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. Journal of the ACM, 64(6), 2017. Google Scholar
  14. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011. Google Scholar