Entropy Matters: Understanding Performance of Sparse Random Embeddings

Author Maciej Skorski

Thumbnail PDF


  • Filesize: 0.69 MB
  • 18 pages

Document Identifiers

Author Details

Maciej Skorski
  • University of Luxembourg, Luxembourg

Cite AsGet BibTex

Maciej Skorski. Entropy Matters: Understanding Performance of Sparse Random Embeddings. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


This work shows how the performance of sparse random embeddings depends on the Renyi entropy-like property of data, improving upon recent works from NIPS'18 and NIPS'19. While the prior works relied on involved combinatorics, the novel approach is simpler and modular. As the building blocks, it develops the following probabilistic facts of general interest: b) a comparison inequality between the linear and quadratic chaos c) a comparison inequality between heterogenic and homogenic linear chaos d) a simpler proof of Latala’s strong result on estimating distributions of IID sums e) sharp bounds for binomial moments in all parameter regimes.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Probabilistic algorithms
  • Theory of computation → Random projections and metric embeddings
  • Random Embeddings
  • Sparse Projections
  • Renyi Entropy


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Dimitris Achlioptas. Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274-281, 2001. Google Scholar
  2. Thomas D Ahle. Asymptotic tail bound and applications, 2017. Google Scholar
  3. Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 557-563, 2006. Google Scholar
  4. Akiko Aizawa. An information-theoretic perspective of tf-idf measures. Information Processing & Management, 39(1):45-65, 2003. Google Scholar
  5. SN Bernshtein. Probability theory (in Russian). Gosizdat, Moscow-Leningrad, 1927. Google Scholar
  6. Gérard Biau, Luc Devroye, and Gábor Lugosi. On the performance of clustering in Hilbert spaces. IEEE Transactions on Information Theory, 54(2):781-790, 2008. Google Scholar
  7. Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245-250, 2001. Google Scholar
  8. Henry W Block, Thomas H Savits, Moshe Shaked, et al. Some concepts of negative dependence. The Annals of Probability, 10(3):765-772, 1982. Google Scholar
  9. Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2012. Google Scholar
  10. Stéphane Boucheron, Olivier Bousquet, Gábor Lugosi, Pascal Massart, et al. Moment inequalities for functions of independent random variables. The Annals of Probability, 33(2):514-560, 2005. Google Scholar
  11. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013. Google Scholar
  12. Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for k-means clustering. In Advances in Neural Information Processing Systems, pages 298-306, 2010. Google Scholar
  13. V Buldygin and K Moskvichova. The sub-gaussian norm of a binary random variable. Theory of probability and mathematical statistics, 86:33-49, 2013. Google Scholar
  14. Christian Cachin. Smooth entropy and rényi entropy. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 193-208. Springer, 1997. URL: https://link.springer.com/chapter/10.1007/3-540-69053-0_14.
  15. Herman Chernoff et al. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493-507, 1952. Google Scholar
  16. Kenneth L Clarkson and David P Woodruff. Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):1-45, 2017. Google Scholar
  17. M Lawrence Clevenson and William Watkins. Majorization and the birthday inequality. Mathematics Magazine, 64(3):183-188, 1991. Google Scholar
  18. Michael B Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 278-287. SIAM, 2016. Google Scholar
  19. Michael B Cohen, TS Jayram, and Jelani Nelson. Simple analyses of the sparse Johnson-Lindenstrauss transform. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  20. Harald Cramér. On a new limit theorem of the theory of probability. Uspekhi Matematicheskikh Nauk, 10:166-178, 1944. Google Scholar
  21. Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson-Lindenstrauss transform. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 341-350, 2010. Google Scholar
  22. Sanjoy Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 634-644. IEEE, 1999. Google Scholar
  23. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. International Computer Science Institute, Technical Report, 22(1):1-5, 1999. Google Scholar
  24. Victor H de la Peña and Stephen J Montgomery-Smith. Decoupling inequalities for the tail probabilities of multivariate u-statistics. The Annals of Probability, pages 806-816, 1995. Google Scholar
  25. Devdatt P Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. BRICS Report Series, 3(25), 1996. Google Scholar
  26. Morris L Eaton. A note on symmetric bernoulli random variables. The annals of mathematical statistics, 41(4):1223-1226, 1970. Google Scholar
  27. Cees M Fortuin, Pieter W Kasteleyn, and Jean Ginibre. Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22(2):89-103, 1971. Google Scholar
  28. Peter Frankl and Hiroshi Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Series B, 44(3):355-362, 1988. Google Scholar
  29. Peter Frankl and Hiroshi Maehara. Some geometric applications of the beta distribution. Annals of the Institute of Statistical Mathematics, 42(3):463-474, 1990. Google Scholar
  30. Casper B Freksen, Lior Kamma, and Kasper Green Larsen. Fully understanding the hashing trick. In Advances in Neural Information Processing Systems, pages 5389-5399, 2018. Google Scholar
  31. David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079-1083, 1971. Google Scholar
  32. G.H. Hardy, Karreman Mathematics Research Collection, J.E. Littlewood, G. Pólya, G. Pólya, and D.E. Littlewood. Inequalities. Cambridge Mathematical Library. Cambridge University Press, 1952. URL: https://books.google.at/books?id=t1RCSP8YKt8C.
  33. Paweł Hitczenko. Domination inequality for martingale transforms of a rademacher sequence. Israel Journal of Mathematics, 84(1-2):161-178, 1993. Google Scholar
  34. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, pages 409-426. Springer, 1994. Google Scholar
  35. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604-613, 1998. Google Scholar
  36. Meena Jagadeesan. Understanding sparse jl for feature hashing. In Advances in Neural Information Processing Systems, pages 15203-15213, 2019. URL: http://arxiv.org/abs/1903.03605.
  37. Thathachar S Jayram and David P Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Transactions on Algorithms (TALG), 9(3):1-17, 2013. Google Scholar
  38. William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a Hilbert space. Contemporary mathematics, 26(189-206):1, 1984. Google Scholar
  39. William B Johnson and Assaf Naor. The Johnson-Lindenstrauss lemma almost characterizes Hilbert space, but not quite. Discrete & Computational Geometry, 43(3):542-553, 2010. Google Scholar
  40. Daniel Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 628-639. Springer, 2011. Google Scholar
  41. Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM (JACM), 61(1):1-23, 2014. Google Scholar
  42. Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the Johnson-Lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1):39-71, 2013. Google Scholar
  43. Michael Kerber and Sharath Raghvendra. Approximation and streaming algorithms for projective clustering via random projections. arXiv preprint, 2014. URL: http://arxiv.org/abs/1407.2063.
  44. Aleksandr Khintchine. Über dyadische brüche. Mathematische Zeitschrift, 18(1):109-116, 1923. Google Scholar
  45. Andreas Knoblauch. Closed-form expressions for the moments of the binomial probability distribution. SIAM Journal on Applied Mathematics, 69(1):197-204, 2008. Google Scholar
  46. Konrad Kolesko and Rafał Latała. Moment estimates for chaoses generated by symmetric random variables with logarithmically convex tails. Statistics & Probability Letters, 107:210-214, 2015. Google Scholar
  47. Samory Kpotufe and Bharath Sriperumbudur. Gaussian sketching yields a jl lemma in rkhs. In International Conference on Artificial Intelligence and Statistics, pages 3928-3937, 2020. Google Scholar
  48. Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images, 2009. Google Scholar
  49. Rafał Latała. Tail and moment estimates for some types of chaos. Studia mathematica, 135(1):39-53, 1999. Google Scholar
  50. Rafał Latała et al. Estimation of moments of sums of independent real random variables. The Annals of Probability, 25(3):1502-1513, 1997. Google Scholar
  51. Ping Li, Trevor J Hastie, and Kenneth W Church. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 287-296, 2006. Google Scholar
  52. Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995. Google Scholar
  53. John E Littlewood. On the probability in the tail of a binomial distribution. Advances in Applied Probability, 1(1):43-72, 1969. Google Scholar
  54. Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1027-1038, 2019. Google Scholar
  55. Jiří Matoušek. On variants of the Johnson-Lindenstrauss lemma. Random Structures & Algorithms, 33(2):142-156, 2008. Google Scholar
  56. Brendan D McKay. On littlewood’s estimate for the binomial distribution. Advances in Applied Probability, 21(2):475-478, 1989. Google Scholar
  57. Jelani Nelson and Huy L Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117-126. IEEE, 2013. Google Scholar
  58. Alfréd Rényi et al. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California, 1961. Google Scholar
  59. Herbert Robbins. A remark on Stirling’s formula. The American mathematical monthly, 62(1):26-29, 1955. Google Scholar
  60. Mark Rudelson, Roman Vershynin, et al. Hanson-wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, 2013. Google Scholar
  61. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 143-152. IEEE, 2006. Google Scholar
  62. Pantelimon Stanica. Good lower and upper bounds on binomial coefficients. Journal of Inequalities in Pure and Applied Mathematics, 2(3):30, 2001. Google Scholar
  63. Roman Vershynin. A simple decoupling inequality in probability theory. preprint, 2011. Google Scholar
  64. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. Google Scholar
  65. Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning, pages 1113-1120, 2009. Google Scholar
  66. Alfred Witkowski. A new proof of the monotonicity of power means. J. Ineq. Pure and Appl. Math, 5(1), 2004. Google Scholar
  67. Shuheng Zhou. Sparse Hanson-Wright inequalities for subgaussian quadratic forms. Bernoulli, 25(3):1603-1639, 2019. appears in 2015 at URL: https://arxiv.org/abs/1510.05517.
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail