Density Estimation for Shift-Invariant Multidimensional Distributions

Authors Anindya De, Philip M. Long, Rocco A. Servedio

Thumbnail PDF


  • Filesize: 0.61 MB
  • 20 pages

Document Identifiers

Author Details

Anindya De
  • EECS Department, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208, USA
Philip M. Long
  • Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
Rocco A. Servedio
  • Department of Computer Science, Columbia University, 500 W. 120th Street, Room 450, New York, NY 10027, USA

Cite AsGet BibTex

Anindya De, Philip M. Long, and Rocco A. Servedio. Density Estimation for Shift-Invariant Multidimensional Distributions. In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 124, pp. 28:1-28:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


We study density estimation for classes of shift-invariant distributions over R^d. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classes of distributions that we consider correspond to different rates of tail decay. For each such class we give an efficient algorithm that learns any distribution in the class from independent samples with respect to total variation distance. As a special case of our general result, we show that d-dimensional shift-invariant distributions which satisfy an exponential tail bound can be learned to total variation distance error epsilon using O~_d(1/ epsilon^{d+2}) examples and O~_d(1/ epsilon^{2d+2}) time. This implies that, for constant d, multivariate log-concave distributions can be learned in O~_d(1/epsilon^{2d+2}) time using O~_d(1/epsilon^{d+2}) samples, answering a question of [Diakonikolas et al., 2016]. All of our results extend to a model of noise-tolerant density estimation using Huber's contamination model, in which the target distribution to be learned is a (1-epsilon,epsilon) mixture of some unknown distribution in the class with some other arbitrary and unknown distribution, and the learning algorithm must output a hypothesis distribution with total variation distance error O(epsilon) from the target distribution. We show that our general results are close to best possible by proving a simple Omega (1/epsilon^d) information-theoretic lower bound on sample complexity even for learning bounded distributions that are shift-invariant.

Subject Classification

ACM Subject Classification
  • Theory of computation → Unsupervised learning and clustering
  • Density estimation
  • unsupervised learning
  • log-concave distributions
  • non-parametrics


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. J. Acharya, C. Daskalakis, and G. Kamath. Optimal Testing for Properties of Distributions. In NIPS, pages 3591-3599, 2015. Google Scholar
  2. J. Acharya, A. Jafarpour, A. Orlitsky, and A.T. Suresh. Near-optimal-sample estimators for spherical Gaussian mixtures, 2014. Google Scholar
  3. Jayadev Acharya, Ilias Diakonikolas, Jerry Li, and Ludwig Schmidt. Sample-optimal density estimation in nearly-linear time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1278-1289. SIAM, 2017. Google Scholar
  4. Maria-Florina Balcan and Philip M Long. Active and passive learning of linear separators under log-concave distributions. In Conference on Learning Theory, pages 288-316, 2013. Google Scholar
  5. Andrew D Barbour and Aihua Xia. Poisson perturbations. ESAIM: Probability and Statistics, 3:131-150, 1999. Google Scholar
  6. Andrew R Barron and Thomas M Cover. Minimum complexity density estimation. IEEE transactions on information theory, 37(4):1034-1054, 1991. Google Scholar
  7. OV Besov. On a family of function spaces. Embedding and extension theorems. Dokl. Akad. Nauk SSSR, 126(6):1163-1165, 1959. Google Scholar
  8. Aditya Bhaskara, Ananda Suresh, and Morteza Zadimoghaddam. Sparse solutions to nonnegative linear systems and applications. In Artificial Intelligence and Statistics, pages 83-92, 2015. Google Scholar
  9. C. L. Canonne, I. Diakonikolas, T. Gouleakis, and R. Rubinfeld. Testing Shape Restrictions of Discrete Distributions. In STACS, pages 25:1-25:14, 2016. Google Scholar
  10. T. Carpenter, I. Diakonikolas, A. Sidiropoulos, and A. Stewart. Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities. CoRR, abs/1802.10575, 2018. Google Scholar
  11. S. Chan, I. Diakonikolas, R. Servedio, and X. Sun. Learning mixtures of structured distributions over discrete domains. In SODA, pages 1380-1394, 2013. Google Scholar
  12. S. Chan, I. Diakonikolas, R. Servedio, and X. Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. In STOC, pages 604-613, 2014. Google Scholar
  13. L. Chen, L. Goldstein, and Q.-M. Shao. Normal Approximation by Stein’s Method. Springer, 2011. Google Scholar
  14. S. Dasgupta. Learning mixtures of Gaussians. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pages 634-644, 1999. Google Scholar
  15. C. Daskalakis, A. De, G. Kamath, and C. Tzamos. A size-free CLT for Poisson multinomials and its applications. In STOC, pages 1074-1086, 2016. Google Scholar
  16. C. Daskalakis, I. Diakonikolas, R. O'Donnell, R. A. Servedio, and L. Tan. Learning sums of independent integer random variables. In FOCS, pages 217-226, 2013. Google Scholar
  17. C. Daskalakis, I. Diakonikolas, and R.A. Servedio. Learning Poisson Binomial Distributions. In STOC, pages 709-728, 2012. Google Scholar
  18. C. Daskalakis and G. Kamath. Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. In COLT, pages 1183-1213, 2014. Google Scholar
  19. A. De, I. Diakonikolas, and R. Servedio. Learning from Satisfying Assignments. In Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 478-497, 2015. Google Scholar
  20. A. De, P. M. Long, and R. A. Servedio. Density estimation for shift-invariant multidimensional distributions, 2018. URL:
  21. A. De, P. M. Long, and R. A. Servedio. Learning Sums of Independent Random Variables with Sparse Collective Support. FOCS, 2018. Google Scholar
  22. Ronald A DeVore and Robert C Sharpley. Besov spaces on domains in ℜ^d. Transactions of the American Mathematical Society, 335(2):843-864, 1993. Google Scholar
  23. L. Devroye and L. Györfi. Nonparametric Density Estimation: The L₁ View. John Wiley &Sons, 1985. Google Scholar
  24. Luc Devroye and Gábor Lugosi. Combinatorial methods in density estimation. Springer Science &Business Media, 2012. Google Scholar
  25. I. Diakonikolas, D. M. Kane, and A. Stewart. Efficient Robust Proper Learning of Log-concave Distributions. CoRR, abs/1606.03077, 2016. Google Scholar
  26. I. Diakonikolas, D. M. Kane, and A. Stewart. Optimal learning via the Fourier transform for sums of independent integer random variables. In COLT, pages 831-849, 2016. Google Scholar
  27. I. Diakonikolas, D. M. Kane, and A. Stewart. The Fourier transform of Poisson multinomial distributions and its algorithmic applications. In STOC, pages 1060-1073, 2016. Google Scholar
  28. Ilias Diakonikolas, Daniel M Kane, and Jelani Nelson. Bounded independence fools degree-2 threshold functions. In FOCS, 2010. Google Scholar
  29. Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Learning multivariate log-concave distributions. In Conference on Learning Theory (COLT), pages 711-727, 2016. Google Scholar
  30. Lasse Holmström and Jussi Klemelä. Asymptotic bounds for the expected L1 error of a multivariate kernel density estimator. Journal of multivariate analysis, 42(2):245-266, 1992. Google Scholar
  31. Il’dar Abdullovich Ibragimov. On the composition of unimodal distributions. Theory of Probability &Its Applications, 1(2):255-260, 1956. Google Scholar
  32. S. Johnson. Saddle-point integration of C-infinity "bump" functions. arXiv preprint, 2015. URL:
  33. A. T. Kalai, A. Moitra, and G. Valiant. Efficiently learning mixtures of two Gaussians. In STOC, pages 553-562, 2010. Google Scholar
  34. Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In SODA, 2010. Google Scholar
  35. A.K.H. Kim and R.J. Samworth. Global rates of convergence in log-concave density estimation. Available at arXiv, 2014. URL:
  36. Jussi Klemelä. Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley Publishing, 2009. Google Scholar
  37. László Lovász and Santosh Vempala. The geometry of logconcave functions and sampling algorithms. Random Struct. Algorithms, 30(3):307-358, 2007. Google Scholar
  38. Elias Masry. Multivariate probability density estimation by wavelet methods: Strong consistency and rates for stationary time series. Stochastic processes and their applications, 67(2):177-193, 1997. Google Scholar
  39. A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of Gaussians. In FOCS, pages 93-102, 2010. Google Scholar
  40. A. Saumard and J.A. Wellner. Log-Concavity and Strong Log-Concavity: a review. Technical report, ArXiV, 23 April 2014. URL:
  41. Winthrop W Smith and Joanne M Smith. Handbook of real-time fast Fourier transforms. IEEE, New York, 1995. Google Scholar
  42. Sergej Lvovich Sobolev. On a theorem of functional analysis. Am. Math. Soc. Transl., 34:39-68, 1963. Google Scholar
  43. Rebecca M Willett and Robert D Nowak. Multiscale Poisson intensity and density estimation. IEEE Transactions on Information Theory, 53(9):3171-3187, 2007. Google Scholar
  44. Y. G. Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. Annals of Statistics, 13:768-774, 1985. Google Scholar