Testing Distributions of Huge Objects

Goldreich, Oded; Ron, Dana

doi:10.4230/LIPIcs.ITCS.2022.78

File

LIPIcs.ITCS.2022.78.pdf

Filesize: 0.83 MB
19 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ITCS.2022.78
URN: urn:nbn:de:0030-drops-156747

Author Details

Oded Goldreich

Department of Computer Science, Weizmann Institute of Science, Israel

Dana Ron

School of Electrical Engineering, Tel Aviv University, Israel

Acknowledgements

We are grateful to Avi Wigderson for a discussion that started this research project.

Cite AsGet BibTex

Oded Goldreich and Dana Ron. Testing Distributions of Huge Objects. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 78:1-78:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ITCS.2022.78

Abstract

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings). Specifically, the distance between distributions is defined as the earth mover’s distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions, where we obtain the following results. - Testing whether a distribution over n-bit long strings is uniform on some set of size m can be done with query complexity Õ(m/ε³), where ε > (log₂m)/n is the proximity parameter. - Testing whether two distribution over n-bit long strings that have support size at most m are identical can be done with query complexity Õ(m^{2/3}/ε³). Both upper bounds are quite tight; that is, for ε = Ω(1), the first task requires Ω(m^c) queries for any c < 1 and n = ω(log m), whereas the second task requires Ω(m^{2/3}) queries. Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution testing model, whereas in the case of the second task the bounds almost match.

Subject Classification

ACM Subject Classification

Theory of computation → Streaming, sublinear and near linear time algorithms

Keywords

Property Testing
Distributions

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Tugkan Batu. Testing properties of distributions. PhD thesis, Computer Science department, Cornell University, 2001.
Tugkan Batu and Clement L. Canonne. Generalized uniformity testing. In Proceedings of the Fiftieth-Eighth Annual Symposium on Foundations of Computer Science (FOCS), pages 880-889, 2017.
Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of the Forty-Second Annual Symposium on Foundations of Computer Science (FOCS), pages 442-451, 2001.
Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of the Forty-First Annual Symposium on Foundations of Computer Science (FOCS), pages 259-269, 2000.
Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM, 60(1):4:1-4:25, 2013. This is a long version of [Tugkan Batu et al., 2000].
Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? Number 9 in Graduate Surveys. Theory of Computing Library, 2020. URL: https://doi.org/10.4086/toc.gs.2020.009.
Ilias Diakonikolas, Daniel Kan, and Alistair Stewart. Sharp bounds for generalized uniformity testing. Technical Report TR17-132, Electronic Colloquium on Computational Complexity (ECCC), 2017.
Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017.
Oded Goldreich and Dana Ron. Lower bounds on the complexity of testing grained distributions. Technical Report TR21-129, Electronic Colloquium on Computational Complexity (ECCC), 2021.
Oded Goldreich and Dana Ron. Testing distributions of huge objects. Technical Report TR21-133, Electronic Colloquium on Computational Complexity (ECCC), 2021.
Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bonds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009.
Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the Fourty-Third Annual ACM Symposium on the Theory of Computing (STOC), pages 685-694, 2011.
Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. Journal of the ACM, 64(6), 2017.
Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011.