A Sparse Johnson-Lindenstrauss Transform Using Fast Hashing

Authors Jakob Bæk Tejs Houen , Mikkel Thorup



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2023.76.pdf
  • Filesize: 0.73 MB
  • 20 pages

Document Identifiers

Author Details

Jakob Bæk Tejs Houen
  • BARC, Department of Computer Science, University of Copenhagen, Denmark
Mikkel Thorup
  • BARC, Department of Computer Science, University of Copenhagen, Denmark

Cite AsGet BibTex

Jakob Bæk Tejs Houen and Mikkel Thorup. A Sparse Johnson-Lindenstrauss Transform Using Fast Hashing. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 76:1-76:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICALP.2023.76

Abstract

The Sparse Johnson-Lindenstrauss Transform of Kane and Nelson (SODA 2012) provides a linear dimensionality-reducing map A ∈ ℝ^{m × u} in 𝓁₂ that preserves distances up to distortion of 1 + ε with probability 1 - δ, where m = O(ε^{-2} log 1/δ) and each column of A has O(ε m) non-zero entries. The previous analyses of the Sparse Johnson-Lindenstrauss Transform all assumed access to a Ω(log 1/δ)-wise independent hash function. The main contribution of this paper is a more general analysis of the Sparse Johnson-Lindenstrauss Transform with less assumptions on the hash function. We also show that the Mixed Tabulation hash function of Dahlgaard, Knudsen, Rotenberg, and Thorup (FOCS 2015) satisfies the conditions of our analysis, thus giving us the first analysis of a Sparse Johnson-Lindenstrauss Transform that works with a practical hash function.

Subject Classification

ACM Subject Classification
  • Theory of computation → Random projections and metric embeddings
  • Theory of computation → Pseudorandomness and derandomization
Keywords
  • dimensionality reduction
  • hashing
  • concentration bounds
  • moment bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Anders Aamand, Jakob Bæk Tejs Knudsen, Mathias Bæk Tejs Knudsen, Peter Michael Reichstein Rasmussen, and Mikkel Thorup. Fast hashing with strong concentration bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 1265-1278, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3357713.3384259.
  2. Anders Aamand and Mikkel Thorup. Non-empty bins with simple tabulation hashing. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 2498-2512. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.153.
  3. Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671-687, 2003. Special Issue on PODS 2001. URL: https://doi.org/10.1016/S0022-0000(03)00025-4.
  4. Nir Ailon and Bernard Chazelle. The fast johnson–lindenstrauss transform and approximate nearest neighbors. SIAM Journal on Computing, 39(1):302-322, 2009. URL: https://doi.org/10.1137/060673096.
  5. Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual BCH codes. In Shang-Hua Teng, editor, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 1-9. SIAM, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347083.
  6. Noga Alon and Bo'az Klartag. Optimal compression of approximate inner products and dimension reduction. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 639-650, October 2017. URL: https://doi.org/10.1109/FOCS.2017.65.
  7. Stefan Bamberger and Felix Krahmer. Optimal fast johnson-lindenstrauss embeddings for large data sets. Sampling Theory, Signal Processing, and Data Analysis, 19, June 2021. URL: https://doi.org/10.1007/s43670-021-00003-5.
  8. Vladimir Braverman, Rafail Ostrovsky, and Yuval Rabani. Rademacher chaos, random eulerian graphs and the sparse johnson-lindenstrauss transform. CoRR, abs/1011.2590, 2010. URL: https://arxiv.org/abs/1011.2590.
  9. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3-15, 2004. Automata, Languages and Programming. URL: https://doi.org/10.1016/S0304-3975(03)00400-6.
  10. Michael B. Cohen, T. S. Jayram, and Jelani Nelson. Simple analyses of the sparse johnson-lindenstrauss transform. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, volume 61 of OASIcs, pages 15:1-15:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.15.
  11. S. Dahlgaard, M. B. T. Knudsen, E. Rotenberg, and M. Thorup. Hashing for statistics over k-partitions. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1292-1310, 2015. URL: https://doi.org/10.1109/FOCS.2015.83.
  12. Søren Dahlgaard, Mathias Bæk Tejs Knudsen, and Mikkel Thorup. Practical hash functions for similarity estimation and dimensionality reduction. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, pages 6618-6628, USA, 2017. Curran Associates Inc. URL: http://dl.acm.org/citation.cfm?id=3295222.3295407.
  13. Søren Dahlgaard and Mikkel Thorup. Approximately minwise independence with twisted tabulation. In R. Ravi and Inge Li Gørtz, editors, Algorithm Theory - SWAT 2014, pages 134-145, Cham, 2014. Springer International Publishing. Google Scholar
  14. Anirban Dasgupta, Ravi Kumar, and Tamás Sarlos. A sparse johnson: Lindenstrauss transform. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC '10, pages 341-350, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1806689.1806737.
  15. Thong T. Do, Lu Gan, Yi Chen, Nam Nguyen, and Trac D. Tran. Fast and efficient dimensionality reduction using structurally random matrices. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1821-1824, 2009. URL: https://doi.org/10.1109/ICASSP.2009.4959960.
  16. Ora Nova Fandina, Mikael Møller Høgsgaard, and Kasper Green Larsen. Barriers for faster dimensionality reduction, 2022. URL: https://doi.org/10.48550/arXiv.2207.03304.
  17. Casper Freksen, Lior Kamma, and Kasper Green Larsen. Fully understanding the hashing trick. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pages 5394-5404, Red Hook, NY, USA, 2018. Curran Associates Inc. Google Scholar
  18. Casper Benjamin Freksen and Kasper Green Larsen. On using toeplitz and circulant matrices for johnson-lindenstrauss transforms. Algorithmica, 82(2):338-354, 2020. URL: https://doi.org/10.1007/s00453-019-00644-y.
  19. Aicke Hinrichs and Jan Vybíral. Johnson-lindenstrauss lemma for circulant matrices. Random Structures & Algorithms, 39(3):391-398, 2011. URL: https://doi.org/10.1002/rsa.20360.
  20. Jakob Bæk Tejs Houen and Mikkel Thorup. Understanding the moments of tabulation hashing via chaoses. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 74:1-74:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ICALP.2022.74.
  21. Meena Jagadeesan. Understanding sparse jl for feature hashing. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, NeurIPS'19, Red Hook, NY, USA, 2019. Curran Associates Inc. Google Scholar
  22. Vishesh Jain, Natesh S. Pillai, Ashwin Sah, Mehtaab Sawhney, and Aaron Smith. Fast and memory-optimal dimension reduction using Kac’s walk. The Annals of Applied Probability, 32(5):4038-4064, 2022. URL: https://doi.org/10.1214/22-AAP1784.
  23. T. S. Jayram and David P. Woodruff. Optimal bounds for johnson-lindenstrauss transforms and streaming problems with subconstant error. ACM Trans. Algorithms, 9(3), June 2013. URL: https://doi.org/10.1145/2483699.2483706.
  24. William Johnson and Joram Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189-206, January 1984. URL: https://doi.org/10.1090/conm/026/737400.
  25. Mark Kac. Foundations of kinetic theory. In Proceedings of The third Berkeley symposium on mathematical statistics and probability, volume 3, pages 171-197, 1956. Google Scholar
  26. Daniel Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit johnson-lindenstrauss families. In Leslie Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 628-639, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. Google Scholar
  27. Daniel M. Kane and Jelani Nelson. A derandomized sparse johnson-lindenstrauss transform, 2010. URL: https://doi.org/10.48550/arXiv.1006.3585.
  28. Daniel M. Kane and Jelani Nelson. Sparser johnson-lindenstrauss transforms. J. ACM, 61(1), January 2014. URL: https://doi.org/10.1145/2559902.
  29. Felix Krahmer and Rachel Ward. New and improved johnson–lindenstrauss embeddings via the restricted isometry property. SIAM Journal on Mathematical Analysis, 43(3):1269-1281, 2011. URL: https://doi.org/10.1137/100810447.
  30. Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633-638, 2017. URL: https://doi.org/10.1109/FOCS.2017.64.
  31. Jelani Nelson and Huy L. NguyÅn. Sparsity lower bounds for dimensionality reducing maps. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC '13, pages 101-110, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2488608.2488622.
  32. Mihai Patrascu and Mikkel Thorup. Twisted tabulation hashing. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 209-228. SIAM, 2013. URL: https://doi.org/10.1137/1.9781611973105.16.
  33. Mihai Pǎtraşcu and Mikkel Thorup. The power of simple tabulation hashing. J. ACM, 59(3), June 2012. URL: https://doi.org/10.1145/2220357.2220361.
  34. Alan Siegel. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing, 33(3):505-543, 2004. Announced at FOCS'89. Google Scholar
  35. Mikkel Thorup. Simple tabulation, fast expanders, double tabulation, and high independence. In 54th Annual Symposium on Foundations of Computer Science (FOCS), pages 90-99, 2013. Google Scholar
  36. Mikkel Thorup and Yin Zhang. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM Journal on Computing, 41(2):293-331, 2012. URL: https://doi.org/10.1137/100800774.
  37. Jan Vybíral. A variant of the johnson–lindenstrauss lemma for circulant matrices. Journal of Functional Analysis, 260(4):1096-1105, 2011. URL: https://doi.org/10.1016/j.jfa.2010.11.014.
  38. Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 1113-1120, New York, NY, USA, 2009. Association for Computing Machinery. URL: https://doi.org/10.1145/1553374.1553516.
  39. Albert Lindsey Zobrist. A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 1970. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail