Barriers for Faster Dimensionality Reduction

Nova Fandina, Ora; Møller Høgsgaard, Mikael; Green Larsen, Kasper

doi:10.4230/LIPIcs.STACS.2023.31

Abstract

The Johnson-Lindenstrauss transform allows one to embed a dataset of n points in ℝ^d into ℝ^m, while preserving the pairwise distance between any pair of points up to a factor (1 ± ε), provided that m = Ω(ε^{-2} lg n). The transform has found an overwhelming number of algorithmic applications, allowing to speed up algorithms and reducing memory consumption at the price of a small loss in accuracy. A central line of research on such transforms, focus on developing fast embedding algorithms, with the classic example being the Fast JL transform by Ailon and Chazelle. All known such algorithms have an embedding time of Ω(d lg d), but no lower bounds rule out a clean O(d) embedding time. In this work, we establish the first non-trivial lower bounds (of magnitude Ω(m lg m)) for a large class of embedding algorithms, including in particular most known upper bounds.

Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. URL: https://doi.org/10.1016/S0022-0000(03)00025-4.
Nir Ailon and Bernard Chazelle. The fast johnson-lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., 39:302-322, 2009.
Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual BCH codes. In Shang-Hua Teng, editor, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 1-9. SIAM, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347083.
Noga Alon, Rina Panigrahy, and Sergey Yekhanin. Deterministic approximation algorithms for the nearest codeword problem. In Irit Dinur, Klaus Jansen, Joseph Naor, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 12th International Workshop, APPROX 2009, and 13th International Workshop, RANDOM 2009, Berkeley, CA, USA, August 21-23, 2009. Proceedings, volume 5687 of Lecture Notes in Computer Science, pages 339-351. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-03685-9_26.
Rosa I. Arriaga and Santosh S. Vempala. An algorithmic theory of learning: Robust concepts and random projection. Mach. Learn., 63(2):161-182, 2006. URL: https://doi.org/10.1007/s10994-006-6265-7.
Stefan Bamberger and Felix Krahmer. Optimal fast johnson-lindenstrauss embeddings for large data sets. Sampling Theory, Signal Processing, and Data Analysis, 19(1):3, 2021. URL: https://doi.org/10.1007/s43670-021-00003-5.
Bernard Chazelle. A spectral approach to lower bounds with applications to geometric searching. SIAM Journal on Computing, 27(2):545-556, 1998. URL: https://doi.org/10.1137/S0097539794275665.
Thong T. Do, Lu Gan, Yi Chen, Nam Nguyen, and Trac D. Tran. Fast and efficient dimensionality reduction using structurally random matrices. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1821-1824, 2009. URL: https://doi.org/10.1109/ICASSP.2009.4959960.
Zeev Dvir, Alexander Golovnev, and Omri Weinstein. Static data structure lower bounds imply rigidity. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 967-978. ACM, 2019. URL: https://doi.org/10.1145/3313276.3316348.
Ora Nova Fandina, Mikael Møller Høgsgaard, and Kasper Green Larsen. The fast johnson-lindenstrauss transform is even faster. CoRR, abs/2204.01800, 2022. URL: https://doi.org/10.48550/arXiv.2204.01800.
Casper Freksen, Lior Kamma, and Kasper Green Larsen. Fully understanding the hashing trick. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pages 5394-5404, Red Hook, NY, USA, 2018. Curran Associates Inc.
Casper Benjamin Freksen and Kasper Green Larsen. On using toeplitz and circulant matrices for johnson-lindenstrauss transforms. Algorithmica, 82(2):338-354, 2020. URL: https://doi.org/10.1007/s00453-019-00644-y.
Joel Friedman. A note on matrix rigidity. Comb., 13(2):235-239, 1993. URL: https://doi.org/10.1007/BF01303207.
Aicke Hinrichs and Jan Vybíral. Johnson-lindenstrauss lemma for circulant matrices. Random Structures & Algorithms, 39(3):391-398, 2011. URL: https://doi.org/10.1002/rsa.20360.
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604-613, New York, NY, USA, 1998. Association for Computing Machinery. URL: https://doi.org/10.1145/276698.276876.
Meena Jagadeesan. Understanding sparse JL for feature hashing. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 15177-15187, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/502cc2c94be1a7c4ca7ef25b8b50bc04-Abstract.html.
Vishesh Jain, Natesh S. Pillai, and Aaron Smith. Kac meets johnson and lindenstrauss: a memory-optimal, fast johnson-lindenstrauss transform. CoRR, abs/2003.10069, 2020. To appear in Annals of Applied Probability. URL: https://doi.org/10.48550/arXiv.2003.10069.
William Johnson and Joram Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189-206, January 1984. URL: https://doi.org/10.1090/conm/026/737400.
Mark Kac. Foundations of kinetic theory. In Proceedings of The third Berkeley symposium on mathematical statistics and probability, pages 171-197. University of California Press Berkeley and Los Angeles, California, 1958.
Daniel M. Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit johnson-lindenstrauss families. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14th International Workshop, APPROX 2011, and 15th International Workshop, RANDOM 2011, Princeton, NJ, USA, August 17-19, 2011. Proceedings, pages 628-639, 2011. URL: https://doi.org/10.1007/978-3-642-22935-0_53.
Daniel M. Kane and Jelani Nelson. Sparser johnson-lindenstrauss transforms. J. ACM, 61(1):4:1-4:23, 2014. URL: https://doi.org/10.1145/2559902.
Felix Krahmer and Rachel Ward. New and improved johnson-lindenstrauss embeddings via the restricted isometry property. SIAM J. Math. Anal., 43(3):1269-1281, 2011. URL: https://doi.org/10.1137/100810447.
Kasper Green Larsen. Constructive discrepancy minimization with hereditary L2 guarantees. In Rolf Niedermeier and Christophe Paul, editors, 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, March 13-16, 2019, Berlin, Germany, volume 126 of LIPIcs, pages 48:1-48:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.48.
Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, pages 82:1-82:11, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.82.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 633-638. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/FOCS.2017.64.
Jacques Morgenstern. On linear algorithms. In Zvi Kohavi and Azaria Paz, editors, Theory of Machines and Computations, pages 59-66. Academic Press, 1971. URL: https://doi.org/10.1016/B978-0-12-417750-5.50009-9.
Jacques Morgenstern. Note on a lower bound on the linear complexity of the fast fourier transform. J. ACM, 20:305-306, 1973.
Jelani Nelson and Huy L. Nguyen. Sparsity lower bounds for dimensionality reducing maps. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 101-110. ACM, 2013. URL: https://doi.org/10.1145/2488608.2488622.
Pavel Pudlák. A note on the use of determinant for proving lower bounds on the size of linear circuits. Information Processing Letters, 74(5):197-201, 2000. URL: https://doi.org/10.1016/S0020-0190(00)00058-2.
Shubhangi Saraf and Sergey Yekhanin. Noisy interpolation of sparse polynomials, and applications. In Proceedings of the 26th Annual IEEE Conference on Computational Complexity, CCC 2011, San Jose, California, USA, June 8-10, 2011, pages 86-92. IEEE Computer Society, 2011. URL: https://doi.org/10.1109/CCC.2011.38.
Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. In Jozef Gruska, editor, Mathematical Foundations of Computer Science 1977, 6th Symposium, Tatranska Lomnica, Czechoslovakia, September 5-9, 1977, Proceedings, volume 53 of Lecture Notes in Computer Science, pages 162-176. Springer, 1977. URL: https://doi.org/10.1007/3-540-08353-7_135.
Jan Vybiral. A variant of the johnson-lindenstrauss lemma for circulant matrices. Journal of Functional Analysis, 260:1096-1105, February 2010. URL: https://doi.org/10.1016/j.jfa.2010.11.014.
Martin J. Wainwright. Basic tail and concentration bounds, pages 21-57. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. URL: https://doi.org/10.1017/9781108627771.002.
Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alexander J. Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, pages 1113-1120, 2009.
Anru R. Zhang and Yuchen Zhou. On the non-asymptotic and sharp lower tail bounds of random variables. Stat, 9(1):e314, 2020. e314 sta4.314. URL: https://doi.org/10.1002/sta4.314.

Barriers for Faster Dimensionality Reduction

Authors Ora Nova Fandina , Mikael Møller Høgsgaard , Kasper Green Larsen

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Barriers for Faster Dimensionality Reduction

Authors Ora Nova Fandina , Mikael Møller Høgsgaard , Kasper Green Larsen

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message