Testing Shape Restrictions of Discrete Distributions

Authors Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, Ronitt Rubinfeld



PDF
Thumbnail PDF

File

LIPIcs.STACS.2016.25.pdf
  • Filesize: 0.79 MB
  • 14 pages

Document Identifiers

Author Details

Clément L. Canonne
Ilias Diakonikolas
Themis Gouleakis
Ronitt Rubinfeld

Cite AsGet BibTex

Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.STACS.2016.25

Abstract

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D in P and l_{1}(D,P)>epsilon. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.
Keywords
  • property testing
  • probability distributions
  • statistics
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jayadev Acharya and Constantinos Daskalakis. Testing Poisson Binomial Distributions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 1829-1840, 2015. Google Scholar
  2. Jayadev Acharya, Constantinos Daskalakis, and Gautam C. Kamath. Optimal Testing for Properties of Distributions. In C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3577-3598. Curran Associates, Inc., 2015. URL: http://papers.nips.cc/paper/5839-optimal-testing-for-properties-of-distributions.pdf.
  3. Jayadev Acharya, Ilias Diakonikolas, Jerry Zheng Li, and Ludwig Schmidt. Sample-optimal density estimation in nearly-linear time. CoRR, abs/1506.00671, 2015. URL: http://arxiv.org/abs/1506.00671.
  4. Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. Testing k-wise and almost k-wise independence. In Proceedings of the 39th ACM Symposium on Theory of Computing, STOC 2007, San Diego, California, USA, June 11-13, 2007, pages 496-505, New York, NY, USA, 2007. URL: http://dx.doi.org/10.1145/1250790.1250863.
  5. Mark Y. An. Log-concave probability distributions: theory and statistical testing. Technical report, Centre for Labour Market and Social Research, Denmark, 1996. URL: http://EconPapers.repec.org/RePEc:fth:clmsre:96-01.
  6. Mark Bagnoli and Ted Bergstrom. Log-concave probability and its applications. Economic Theory, 26(2):445-469, 2005. URL: http://dx.doi.org/10.1007/s00199-004-0514-4,
  7. Richard E. Barlow, Bartholomew D.J, J.M Bremner, and H.D Brunk. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley Series in Probability and Mathematical Statistics. J. Wiley, London, New York, 1972. Google Scholar
  8. Tuğkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of approximating the entropy. SIAM Journal on Computing, 35(1):132-150, 2005. Google Scholar
  9. Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In 42nd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2001, Las Vegas, Nevada, USA, October 14-17 2001, pages 442-451, 2001. Google Scholar
  10. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In 41st Annual IEEE Symposium on Foundations of Computer Science, FOCS 2000, Redondo Beach, California, USA, November 12-14 2000, pages 259-269, 2000. Google Scholar
  11. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of the 36th ACM Symposium on Theory of Computing, STOC 2004, Chicago, IL, USA, June 13-16, 2004, pages 381-390, New York, NY, USA, 2004. ACM. URL: http://dx.doi.org/10.1145/1007352.1007414.
  12. Clément L. Canonne. A Survey on Distribution Testing: your data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22:63, April 2015. Google Scholar
  13. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Learning mixtures of structured distributions over discrete domains. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1380-1394, 2013. Google Scholar
  14. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient density estimation via piecewise polynomial approximation. In Proceedings of the 45th ACM Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 604-613. ACM, 2014. Google Scholar
  15. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Near-optimal density estimation in near-linear time using variable-width histograms. In Annual Conference on Neural Information Processing Systems (NIPS), pages 1844-1852, 2014. Google Scholar
  16. Siu-on Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1193-1203, 2014. Google Scholar
  17. Constantinos Daskalakis, Ilias Diakonikolas, and Rocco A. Servedio. Learning k-modal distributions via testing. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1371-1385. Society for Industrial and Applied Mathematics (SIAM), 2012. Google Scholar
  18. Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio, Gregory Valiant, and Paul Valiant. Testing k-modal distributions: Optimal algorithms via reductions. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1833-1852. Society for Industrial and Applied Mathematics (SIAM), 2013. URL: http://dl.acm.org/citation.cfm?id=2627817.2627948.
  19. Ilias Diakonikolas. Learning structured distributions. In Handbook of Big Data. CRC Press, 2016. Google Scholar
  20. Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Optimal algorithms and lower bounds for testing closeness of structured distributions. In 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, 2015. Google Scholar
  21. Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Testing Identity of Structured Distributions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, 2015. Google Scholar
  22. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000. Google Scholar
  23. Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 733-742, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics (SIAM). URL: http://dl.acm.org/citation.cfm?id=1109557.1109637.
  24. Philip Hougaard. Survival models for heterogeneous populations derived from stable distributions. Biometrika, 73:397-96, 1986. Google Scholar
  25. Piotr Indyk, Reut Levi, and Ronitt Rubinfeld. Approximating and Testing k-Histogram Distributions in Sub-linear Time. In Proceedings of PODS, pages 15-22, 2012. Google Scholar
  26. Benoit Mandelbrot. New methods in statistical economics. Journal of Political Economy, 71(5):pp. 421-440, 1963. Google Scholar
  27. Pascal Massart and Jean Picard. Concentration inequalities and model selection. Lecture Notes in Mathematics. Springer, 33, 2003, Saint-Flour, Cantal, 2007. URL: http://opac.inria.fr/record=b1122538.
  28. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750-4755, 2008. Google Scholar
  29. Dana Ron. Property Testing: A Learning Theory Perspective. Foundations and Trends in Machine Learning, 1(3):307-402, 2008. Google Scholar
  30. Dana Ron. Algorithmic and analysis techniques in property testing. Foundations and Trends in Theoretical Computer Science, 5:73-205, 2010. Google Scholar
  31. Ronitt Rubinfeld. Taming Big Probability Distributions. XRDS, 19(1):24-28, September 2012. URL: http://dx.doi.org/10.1145/2331042.2331052.
  32. Debasis Sengupta and Asok K. Nanda. Log-concave and concave distributions in reliability. Naval Research Logistics (NRL), 46(4):419-433, 1999. URL: http://dx.doi.org/10.1002/(SICI)1520-6750(199906)46:4<419::AID-NAV5>3.0.CO;2-B.
  33. Mervyn J. Silvapulle and Pranab K. Sen. Constrained Statistical Inference. John Wiley &Sons, Inc., 2001. URL: http://dx.doi.org/10.1002/9781118165614.fmatter.
  34. Constantino Tsallis, Silvio V. F. Levy, André M. C. Souza, and Roger Maynard. Statistical-mechanical foundation of the ubiquity of Lévy distributions in nature. Phys. Rev. Lett., 75:3589-3593, Nov 1995. URL: http://dx.doi.org/10.1103/PhysRevLett.75.3589.
  35. Gregory Valiant and Paul Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC), 17:179, 2010. Google Scholar
  36. Gregory Valiant and Paul Valiant. Estimating the unseen: An n/log n-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 685-694, 2011. Google Scholar
  37. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In 55th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, 2014. Google Scholar
  38. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011. Google Scholar
  39. Guenther Walther. Inference and modeling with log-concave distributions. Statistical Science, 24(3):319-327, 2009. URL: http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1270041258.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail