Proofs of Proximity for Distribution Testing

Chiesa, Alessandro; Gur, Tom

doi:10.4230/LIPIcs.ITCS.2018.53

Abstract

Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large number of samples to test, which motivates the question of whether there are natural settings wherein fewer samples suffice.

We initiate a study of proofs of proximity for properties of distributions. In their basic form, these proof systems consist of a tester that not only has sample access to a distribution but also explicit access to a proof string that depends on the distribution. We refer to these as NP distribution testers, or MA distribution testers if the tester is a probabilistic algorithm. We also study the more general notion of IP distribution testers, in which the tester interacts with an all-powerful untrusted prover.

We investigate the power and limitations of proofs of proximity for distributions and chart a landscape that, surprisingly, is significantly different from that of proofs of proximity for functions. Our main results include showing that MA distribution testers can be quadratically stronger than standard distribution testers, but no stronger than that; in contrast, IP distribution testers can be exponentially stronger than standard distribution testers, but when restricted to public coins they can be at best quadratically stronger.

Scott Aaronson and Avi Wigderson. Algebrization: A new barrier in complexity theory. ACM Transactions on Computation Theory, 1:2:1-2:54, 2009.
Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. A chasm between identity and equivalence testing with conditional queries. In Proceedings of the 19th International Workshop on Randomization and Computation, RANDOM '15, pages 449-466, 2015.
Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, and Shengjun Pan. Competitive closeness testing. In Proceedings of the 24th Annual Conference on Learning Theory, COLT 2011, pages 47-68, 2011.
László Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory. In Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science, FOCS 1986, pages 337-347, 1986.
László Babai and Shlomo Moran. Arthur-merlin games: a randomized proof system, and a hierarchy of complexity classes. Journal of Computer and System Sciences, 36:254-276, 1988.
Tuğkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating the entropy. SIAM Journal on Computing, 35(1):132-150, 2005.
Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, pages 442-451, 2001.
Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, pages 259-269, 2000.
Eli Ben-Sasson, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil P. Vadhan. Robust PCPs of proximity, shorter PCPs, and applications to coding. SIAM Journal on Computing, 36(4):889-974, 2006.
Itay Berman, Ron D. Rothblum, and Vinod Vaikuntanathan. Zero-knowledge proofs of proximity. In Proceedings of the 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, page To appear, 2018.
Bhaswar B. Bhattacharya and Gregory Valiant. Testing closeness with unequal sized samples. In Proceedings of the 2015 Conference on Neural Information Processing Systems, NIPS 2015, pages 2611-2619, 2015.
Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. In Proceedings of the 2nd Innovations in Theoretical Computer Science Conference, ITCS 2011, pages 239-252, 2011.
Eric Blais, Clément L. Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity (Alice and Bob don't talk to each other anymore.). In Proceedings of the 32th Conference on Computational Complexity, CCC 2017, pages 1-42, 2017.
Kenneth P Burnham and David R Anderson. Model selection and multimodel inference: a practical information-theoretic approach. Springer Science &Business Media, 2003.
Clément Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In International Colloquium on Automata, Languages, and Programming, ICALP '14, pages 283-295, 2014.
Clément L. Canonne. A survey on distribution testing. your data is big. But is it blue?, 2017.
Clément L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, and Karl Wimmer. Testing k-monotonicity. Electronic Colloquium on Computational Complexity (ECCC), 23:136, 2016. URL: http://eccc.hpi-web.de/report/2016/136.
Clément L. Canonne, Dana Ron, and Rocco A. Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 44(3):540-616, 2015.
Amit Chakrabarti, Graham Cormode, Andrew McGregor, and Justin Thaler. Annotations in data streams. ACM Transactions on Algorithms, 11, 2014.
Amit Chakrabarti, Graham Cormode, Andrew McGregor, Justin Thaler, and Suresh Venkatasubramanian. Verifiable stream computation and Arthur-Merlin communication. In Proceedings of the 30th Conference on Computational Complexity, CCC 2015, pages 217-243, 2015.
Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. SIAM Journal on Computing, 45(4):1261-1296, 2016.
Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the 25th Symposium on Discrete Algorithms, SODA 2014, pages 1193-1203, 2014.
Alessandro Chiesa and Tom Gur. Proofs of proximity for distribution testing. Electronic Colloquium on Computational Complexity (ECCC), 24:155, 2017. URL: https://eccc.weizmann.ac.il/report/2017/155.
Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In Proceedings of the 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, pages 685-694, 2016.
Funda Ergün, Ravi Kumar, and Ronitt Rubinfeld. Fast approximate probabilistically checkable proofs. Information and Computation, 189(2):135-159, 2004.
Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. Faster algorithms for testing under conditional sampling. In Conference on Learning Theory, COLT '15, pages 607-636, 2015.
Eldar Fischer, Yonatan Goldhirsh, and Oded Lachish. Partial tests, universal tests and decomposability. In Proceedings of the 5th Innovations in Theoretical Computer Science Conference, ITCS 2014, pages 483-500, 2014.
Eldar Fischer, Oded Lachish, and Yadu Vasudev. Trading query complexity for sample-based testing and multi-testing scalability. In Proceedings of the 56th Symposium on Foundations of Computer Science, FOCS 2015, pages 1163-1182, 2015.
Oded Goldreich. Introduction to property testing, 2017.
Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653-750, 1998.
Oded Goldreich and Tom Gur. Universal locally verifiable codes and 3-round interactive proofs of proximity for CSP. Electronic Colloquium on Computational Complexity (ECCC), 23:192, 2016. URL: http://eccc.hpi-web.de/report/2016/192.
Oded Goldreich, Tom Gur, and Ilan Komargodski. Strong locally testable codes with relaxed local decoders. In Proceedings of the 30th Conference on Computational Complexity, CCC 2015, pages 1-41, 2015.
Oded Goldreich, Tom Gur, and Ron D. Rothblum. Proofs of proximity for context-free languages and read-once branching programs. In Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming, ICALP 2015, pages 666-677, 2015.
Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography, pages 68-75. Springer, 2011.
Oded Goldreich and Or Sheffet. On the randomness complexity of property testing. Computational Complexity, 19(1):99-133, 2010.
Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on computing, 18(1):186-208, 1989.
Shafi Goldwasser and Michael Sipser. Private coins versus public coins in interactive proof systems. In Proceedings of the 18th a Annual ACM Symposium on Theory of Computing, STOC 1986, pages 59-68, 1986.
Tom Gur and Ran Raz. Arthur-Merlin streaming complexity. Information and Computing, 243:145-165, 2015.
Tom Gur and Ron Rothblum. Non-interactive proofs of proximity. Computational Complexity, To appear, 2017.
Tom Gur and Ron D. Rothblum. A hierarchy theorem for interactive proofs of proximity. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, 2017.
Yael Tauman Kalai and Ron D. Rothblum. Arguments of proximity. In Proceedings of the 35th Annual International Cryptology Conference, CRYPTO 2015, pages 422-442, 2015.
Hartmut Klauck. On Arthur Merlin games in communication complexity. In Proceedings of the 26th Conference on Computational Complexity, CCC 2017, pages 189-199, 2011.
Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science &Business Media, 2006.
Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295-347, 2013.
Ilan Newman and Mario Szegedy. Public vs. private coin flips in one round communication games. In Proceedings of the 28th a Annual ACM Symposium on Theory of Computing, STOC 1996, pages 561-570, 1996.
Liam Paninski. Estimating entropy on m bins given fewer than m samples. IEEE Transactions on Information Theory, 50(9):2200-2203, 2004.
Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750-4755, 2008.
Aditi Raghunathan, Gregory Valiant, and James Zou. Estimating the unseen from multiple populations. CoRR, abs/1707.03854, 2017. URL: http://arxiv.org/abs/1707.03854.
Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM Journal on Computing, 39:813-842, 2009.
Ran Raz and Amir Shpilka. On the power of quantum proofs. In Proceedings of the 19th Conference on Computational Complexity, CCC 2004, pages 260-274, 2004.
Omer Reingold, Ron Rothblum, and Guy Rothblum. Constant-round interactive proofs for delegating computation. In Proceedings of the 48th ACM Symposium on the Theory of Computing, STOC 2016, pages 49-62, 2016.
Guy N. Rothblum, Salil P. Vadhan, and Avi Wigderson. Interactive proofs of proximity: delegating computation in sublinear time. In Proceedings of the 45th Symposium on Theory of Computing, STOC 2013, pages 793-802, 2013.
Ronitt Rubinfeld. Taming big probability distributions. ACM Crossroads, 19(1):24-28, 2012.
Ronitt Rubinfeld and Rocco Servedio. Testing monotone high-dimensional distributions. Random Structures &Algorithms, 34:24-44, 2009.
Ronitt Rubinfeld and Madhu Sudan. Robust characterization of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252-271, 1996.
Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the 43rd Symposium on Theory of Computing, STOC 2011, pages 685-694, 2011.
Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM Journal on Computing, 46(1):429-455, 2017.
Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40:1927-1968, 2011.
Thomas Vidick and John Watrous. Quantum proofs. Foundations and Trends in Theoretical Computer Science, 11:1-215, 2016.
John Von Neumann. Various techniques used in connection with random digits. National Bureau of Standards Applied Math Series, 12:36-38, 1951.
John Watrous. Succinct quantum proofs for properties of finite groups. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science, FOCS 2000, pages 537-546, 2000.
James Zou, Gregory Valiant, Paul Valiant, Konrad Karczewski, Siu On Chan, Kaitlin Samocha, Monkol Lek, Shamil Sunyaev, Mark Daly, and Daniel G MacArthur. Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nature Communications, 7, 2016.

Proofs of Proximity for Distribution Testing

Authors Alessandro Chiesa, Tom Gur

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message