A Chasm Between Identity and Equivalence Testing with Conditional Queries

Acharya, Jayadev; Canonne, Clément L.; Kamath, Gautam

doi:10.4230/LIPIcs.APPROX-RANDOM.2015.449

File

LIPIcs.APPROX-RANDOM.2015.449.pdf

Filesize: 0.6 MB
18 pages

Document Identifiers

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2015.449
URN: urn:nbn:de:0030-drops-53178

Author Details

Jayadev Acharya

Clément L. Canonne

Gautam Kamath

Cite AsGet BibTex

Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A Chasm Between Identity and Equivalence Testing with Conditional Queries. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 449-466, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2015.449

Abstract

A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain. In particular, Canonne, Ron, and Servedio showed that, in this setting, testing identity of an unknown distribution D (i.e., whether D = D* for an explicitly known D*) can be done with a constant number of samples, independent of the support size n - in contrast to the required sqrt(n) in the standard sampling model. However, it was unclear whether the same held for the case of testing equivalence, where both distributions are unknown. Indeed, while Canonne, Ron, and Servedio established a polylog(n)-query upper bound for equivalence testing, very recently brought down to ~O(log(log(n))) by Falahatgar et al., whether a dependence on the domain size n is necessary was still open, and explicitly posed by Fischer at the Bertinoro Workshop on Sublinear Algorithms. In this work, we answer the question in the positive, showing that any testing algorithm for equivalence must make Omega(sqrt(log(log(n)))) queries in the conditional sampling model. Interestingly, this demonstrates an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity n^(Theta(1))). Turning to another question, we investigate the complexity of support size estimation. We provide a doubly-logarithmic upper bound for the adaptive version of this problem, generalizing work of Ron and Tsur to our weaker model. We also establish a logarithmic lower bound for the non-adaptive version of this problem. This latter result carries on to the related problem of non-adaptive uniformity testing, an exponential improvement over previous results that resolves an open question of Chakraborty, Fischer, Goldhirsh, and Matsliah.

Keywords

property testing
probability distributions
conditional samples

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A chasm between identity and equivalence testing with conditional queries. ArXiV, abs/1411.7346, April 2015.
Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. In Proceedings of 24th COLT, pages 47-68, 2011.
Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan, and Ananda Theertha Suresh. Competitive classification and closeness testing. In Proceedings of 25th COLT, pages 1-18, 2012.
Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of FOCS, pages 442-451, 2001.
Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM, 60(1):1-25, 2013.
Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381-390, New York, NY, USA, 2004. ACM.
Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. In Proceedings of ITCS, pages 239-252, 2011.
Clément L. Canonne. A Survey on Distribution Testing: your data is Big, but is it Blue? Electronic Colloquium on Computational Complexity (ECCC), TR15-063, April 2015.
Clément L. Canonne, Dana Ron, and Rocco A. Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 44(3), 2015.
Clément L. Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In Proceedings of ICALP, pages 283-295, 2014.
Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. In Proceedings of ITCS, pages 561-580, New York, NY, USA, 2013. ACM.
Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193-1203. Society for Industrial and Applied Mathematics (SIAM), 2014.
Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, and Ananda Theertha Suresh. Faster algorithms for testing under conditional sampling. In Proceedings of 28th COLT, 2015.
Eldar Fischer. List of Open Problems in Sublinear Algorithms: Problem 66. http://sublinear.info/66. Suggested by Fischer at Bertinoro Workshop on Sublinear Algorithms 2014.
Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), TR00-020, March 2000.
Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of SODA, pages 733-742. Society for Industrial and Applied Mathematics (SIAM), 2006.
Piotr Indyk, Reut Levi, and Ronitt Rubinfeld. Approximating and Testing k-Histogram Distributions in Sub-linear Time. In Proceedings of PODS, pages 15-22, 2012.
Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9(8):295-347, 2013.
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, New York, NY, USA, 1995.
Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750-4755, 2008.
Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009.
Dana Ron and Gilad Tsur. The power of an example: Hidden set size approximation using group queries and conditional sampling. ArXiV, abs/1404.5568, 2014.
Ronitt Rubinfeld. Taming Big Probability Distributions. XRDS, 19(1):24-28, September 2012.
Ronitt Rubinfeld and Rocco A. Servedio. Testing monotone high-dimensional distributions. Random Structures and Algorithms, 34(1):24-44, January 2009.
L. Stockmeyer. On approximation algorithms for #P. SIAM Journal on Computing, 14(4):849-861, 1985.
Gregory Valiant and Paul Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC), TR10-179, 2010.
Gregory Valiant and Paul Valiant. Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC), TR10-180, 2010.
Gregory Valiant and Paul Valiant. The power of linear estimators. In Proceedings of FOCS, pages 403-412, October 2011. See also [Valiant and Valiant, 2010] and [Valiant and Valiant, 2010].
Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In Proceedings of FOCS, 2014.
Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011.