Entropy Matters: Understanding Performance of Sparse Random Embeddings

Skorski, Maciej

doi:10.4230/LIPIcs.ISAAC.2022.18

Abstract

This work shows how the performance of sparse random embeddings depends on the Renyi entropy-like property of data, improving upon recent works from NIPS'18 and NIPS'19. While the prior works relied on involved combinatorics, the novel approach is simpler and modular. As the building blocks, it develops the following probabilistic facts of general interest: b) a comparison inequality between the linear and quadratic chaos c) a comparison inequality between heterogenic and homogenic linear chaos d) a simpler proof of Latala’s strong result on estimating distributions of IID sums e) sharp bounds for binomial moments in all parameter regimes.

Dimitris Achlioptas. Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274-281, 2001.
Thomas D Ahle. Asymptotic tail bound and applications, 2017.
Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 557-563, 2006.
Akiko Aizawa. An information-theoretic perspective of tf-idf measures. Information Processing & Management, 39(1):45-65, 2003.
SN Bernshtein. Probability theory (in Russian). Gosizdat, Moscow-Leningrad, 1927.
Gérard Biau, Luc Devroye, and Gábor Lugosi. On the performance of clustering in Hilbert spaces. IEEE Transactions on Information Theory, 54(2):781-790, 2008.
Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245-250, 2001.
Henry W Block, Thomas H Savits, Moshe Shaked, et al. Some concepts of negative dependence. The Annals of Probability, 10(3):765-772, 1982.
Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2012.
Stéphane Boucheron, Olivier Bousquet, Gábor Lugosi, Pascal Massart, et al. Moment inequalities for functions of independent random variables. The Annals of Probability, 33(2):514-560, 2005.
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for k-means clustering. In Advances in Neural Information Processing Systems, pages 298-306, 2010.
V Buldygin and K Moskvichova. The sub-gaussian norm of a binary random variable. Theory of probability and mathematical statistics, 86:33-49, 2013.
Christian Cachin. Smooth entropy and rényi entropy. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 193-208. Springer, 1997. URL: https://link.springer.com/chapter/10.1007/3-540-69053-0_14.
Herman Chernoff et al. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493-507, 1952.
Kenneth L Clarkson and David P Woodruff. Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):1-45, 2017.
M Lawrence Clevenson and William Watkins. Majorization and the birthday inequality. Mathematics Magazine, 64(3):183-188, 1991.
Michael B Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 278-287. SIAM, 2016.
Michael B Cohen, TS Jayram, and Jelani Nelson. Simple analyses of the sparse Johnson-Lindenstrauss transform. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
Harald Cramér. On a new limit theorem of the theory of probability. Uspekhi Matematicheskikh Nauk, 10:166-178, 1944.
Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson-Lindenstrauss transform. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 341-350, 2010.
Sanjoy Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 634-644. IEEE, 1999.
Sanjoy Dasgupta and Anupam Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. International Computer Science Institute, Technical Report, 22(1):1-5, 1999.
Victor H de la Peña and Stephen J Montgomery-Smith. Decoupling inequalities for the tail probabilities of multivariate u-statistics. The Annals of Probability, pages 806-816, 1995.
Devdatt P Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. BRICS Report Series, 3(25), 1996.
Morris L Eaton. A note on symmetric bernoulli random variables. The annals of mathematical statistics, 41(4):1223-1226, 1970.
Cees M Fortuin, Pieter W Kasteleyn, and Jean Ginibre. Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22(2):89-103, 1971.
Peter Frankl and Hiroshi Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Series B, 44(3):355-362, 1988.
Peter Frankl and Hiroshi Maehara. Some geometric applications of the beta distribution. Annals of the Institute of Statistical Mathematics, 42(3):463-474, 1990.
Casper B Freksen, Lior Kamma, and Kasper Green Larsen. Fully understanding the hashing trick. In Advances in Neural Information Processing Systems, pages 5389-5399, 2018.
David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079-1083, 1971.
G.H. Hardy, Karreman Mathematics Research Collection, J.E. Littlewood, G. Pólya, G. Pólya, and D.E. Littlewood. Inequalities. Cambridge Mathematical Library. Cambridge University Press, 1952. URL: https://books.google.at/books?id=t1RCSP8YKt8C.
Paweł Hitczenko. Domination inequality for martingale transforms of a rademacher sequence. Israel Journal of Mathematics, 84(1-2):161-178, 1993.
Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, pages 409-426. Springer, 1994.
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604-613, 1998.
Meena Jagadeesan. Understanding sparse jl for feature hashing. In Advances in Neural Information Processing Systems, pages 15203-15213, 2019. URL: http://arxiv.org/abs/1903.03605.
Thathachar S Jayram and David P Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Transactions on Algorithms (TALG), 9(3):1-17, 2013.
William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a Hilbert space. Contemporary mathematics, 26(189-206):1, 1984.
William B Johnson and Assaf Naor. The Johnson-Lindenstrauss lemma almost characterizes Hilbert space, but not quite. Discrete & Computational Geometry, 43(3):542-553, 2010.
Daniel Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 628-639. Springer, 2011.
Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM (JACM), 61(1):1-23, 2014.
Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the Johnson-Lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1):39-71, 2013.
Michael Kerber and Sharath Raghvendra. Approximation and streaming algorithms for projective clustering via random projections. arXiv preprint, 2014. URL: http://arxiv.org/abs/1407.2063.
Aleksandr Khintchine. Über dyadische brüche. Mathematische Zeitschrift, 18(1):109-116, 1923.
Andreas Knoblauch. Closed-form expressions for the moments of the binomial probability distribution. SIAM Journal on Applied Mathematics, 69(1):197-204, 2008.
Konrad Kolesko and Rafał Latała. Moment estimates for chaoses generated by symmetric random variables with logarithmically convex tails. Statistics & Probability Letters, 107:210-214, 2015.
Samory Kpotufe and Bharath Sriperumbudur. Gaussian sketching yields a jl lemma in rkhs. In International Conference on Artificial Intelligence and Statistics, pages 3928-3937, 2020.
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images, 2009.
Rafał Latała. Tail and moment estimates for some types of chaos. Studia mathematica, 135(1):39-53, 1999.
Rafał Latała et al. Estimation of moments of sums of independent real random variables. The Annals of Probability, 25(3):1502-1513, 1997.
Ping Li, Trevor J Hastie, and Kenneth W Church. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 287-296, 2006.
Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995.
John E Littlewood. On the probability in the tail of a binomial distribution. Advances in Applied Probability, 1(1):43-72, 1969.
Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1027-1038, 2019.
Jiří Matoušek. On variants of the Johnson-Lindenstrauss lemma. Random Structures & Algorithms, 33(2):142-156, 2008.
Brendan D McKay. On littlewood’s estimate for the binomial distribution. Advances in Applied Probability, 21(2):475-478, 1989.
Jelani Nelson and Huy L Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117-126. IEEE, 2013.
Alfréd Rényi et al. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California, 1961.
Herbert Robbins. A remark on Stirling’s formula. The American mathematical monthly, 62(1):26-29, 1955.
Mark Rudelson, Roman Vershynin, et al. Hanson-wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, 2013.
Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pages 143-152. IEEE, 2006.
Pantelimon Stanica. Good lower and upper bounds on binomial coefficients. Journal of Inequalities in Pure and Applied Mathematics, 2(3):30, 2001.
Roman Vershynin. A simple decoupling inequality in probability theory. preprint, 2011.
Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning, pages 1113-1120, 2009.
Alfred Witkowski. A new proof of the monotonicity of power means. J. Ineq. Pure and Appl. Math, 5(1), 2004.
Shuheng Zhou. Sparse Hanson-Wright inequalities for subgaussian quadratic forms. Bernoulli, 25(3):1603-1639, 2019. appears in 2015 at URL: https://arxiv.org/abs/1510.05517.

Entropy Matters: Understanding Performance of Sparse Random Embeddings

Author Maciej Skorski

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message