A Chasm Between Identity and Equivalence Testing with Conditional Queries

Authors Jayadev Acharya, Clément L. Canonne, Gautam Kamath



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2015.449.pdf
  • Filesize: 0.6 MB
  • 18 pages

Document Identifiers

Author Details

Jayadev Acharya
Clément L. Canonne
Gautam Kamath

Cite As Get BibTex

Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A Chasm Between Identity and Equivalence Testing with Conditional Queries. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 449-466, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2015.449

Abstract

A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain.

In particular, Canonne, Ron, and Servedio showed that, in this setting, testing identity of an unknown distribution D (i.e., whether D = D* for an explicitly known D*) can be done with a constant number of samples, independent of the support size n - in contrast to the required sqrt(n) in the standard sampling model. However, it was unclear whether the same held for the case of testing equivalence, where both distributions are unknown. Indeed, while Canonne, Ron, and Servedio established a polylog(n)-query upper bound for equivalence testing, very recently brought down to ~O(log(log(n))) by Falahatgar et al., whether a dependence on the domain size n is necessary was still open, and explicitly posed by Fischer at the Bertinoro Workshop on Sublinear Algorithms. In this work, we answer the question in the positive, showing that any testing algorithm for equivalence must make Omega(sqrt(log(log(n)))) queries in the conditional sampling model. Interestingly, this demonstrates an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity n^(Theta(1))).

Turning to another question, we investigate the complexity of support size estimation. We provide a doubly-logarithmic upper bound for the adaptive version of this problem, generalizing work of Ron and Tsur to our weaker model. We also establish a logarithmic lower bound for the non-adaptive version of this problem. This latter result carries on to the related problem of non-adaptive uniformity testing, an exponential improvement over previous results that resolves an open question of Chakraborty, Fischer, Goldhirsh, and Matsliah.

Subject Classification

Keywords
  • property testing
  • probability distributions
  • conditional samples

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A chasm between identity and equivalence testing with conditional queries. ArXiV, abs/1411.7346, April 2015. Google Scholar
  2. Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. In Proceedings of 24th COLT, pages 47-68, 2011. Google Scholar
  3. Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan, and Ananda Theertha Suresh. Competitive classification and closeness testing. In Proceedings of 25th COLT, pages 1-18, 2012. Google Scholar
  4. Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of FOCS, pages 442-451, 2001. Google Scholar
  5. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM, 60(1):1-25, 2013. Google Scholar
  6. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381-390, New York, NY, USA, 2004. ACM. Google Scholar
  7. Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. In Proceedings of ITCS, pages 239-252, 2011. Google Scholar
  8. Clément L. Canonne. A Survey on Distribution Testing: your data is Big, but is it Blue? Electronic Colloquium on Computational Complexity (ECCC), TR15-063, April 2015. Google Scholar
  9. Clément L. Canonne, Dana Ron, and Rocco A. Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 44(3), 2015. Google Scholar
  10. Clément L. Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In Proceedings of ICALP, pages 283-295, 2014. Google Scholar
  11. Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. In Proceedings of ITCS, pages 561-580, New York, NY, USA, 2013. ACM. Google Scholar
  12. Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193-1203. Society for Industrial and Applied Mathematics (SIAM), 2014. Google Scholar
  13. Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, and Ananda Theertha Suresh. Faster algorithms for testing under conditional sampling. In Proceedings of 28th COLT, 2015. Google Scholar
  14. Eldar Fischer. List of Open Problems in Sublinear Algorithms: Problem 66. http://sublinear.info/66. Suggested by Fischer at Bertinoro Workshop on Sublinear Algorithms 2014.
  15. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), TR00-020, March 2000. Google Scholar
  16. Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of SODA, pages 733-742. Society for Industrial and Applied Mathematics (SIAM), 2006. Google Scholar
  17. Piotr Indyk, Reut Levi, and Ronitt Rubinfeld. Approximating and Testing k-Histogram Distributions in Sub-linear Time. In Proceedings of PODS, pages 15-22, 2012. Google Scholar
  18. Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9(8):295-347, 2013. Google Scholar
  19. Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, New York, NY, USA, 1995. Google Scholar
  20. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750-4755, 2008. Google Scholar
  21. Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009. Google Scholar
  22. Dana Ron and Gilad Tsur. The power of an example: Hidden set size approximation using group queries and conditional sampling. ArXiV, abs/1404.5568, 2014. Google Scholar
  23. Ronitt Rubinfeld. Taming Big Probability Distributions. XRDS, 19(1):24-28, September 2012. Google Scholar
  24. Ronitt Rubinfeld and Rocco A. Servedio. Testing monotone high-dimensional distributions. Random Structures and Algorithms, 34(1):24-44, January 2009. Google Scholar
  25. L. Stockmeyer. On approximation algorithms for #P. SIAM Journal on Computing, 14(4):849-861, 1985. Google Scholar
  26. Gregory Valiant and Paul Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC), TR10-179, 2010. Google Scholar
  27. Gregory Valiant and Paul Valiant. Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC), TR10-180, 2010. Google Scholar
  28. Gregory Valiant and Paul Valiant. The power of linear estimators. In Proceedings of FOCS, pages 403-412, October 2011. See also [Valiant and Valiant, 2010] and [Valiant and Valiant, 2010]. Google Scholar
  29. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In Proceedings of FOCS, 2014. Google Scholar
  30. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail