A Euclidean Embedding for Computing Persistent Homology with Gaussian Kernels

Authors Jean-Daniel Boissonnat, Kunal Dutta



PDF
Thumbnail PDF

File

LIPIcs.ESA.2024.29.pdf
  • Filesize: 0.75 MB
  • 18 pages

Document Identifiers

Author Details

Jean-Daniel Boissonnat
  • Université Côte d'Azur, INRIA, Sophia-Antipolis, France
Kunal Dutta
  • Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Poland

Acknowledgements

The authors thank the referees for numerous helpful suggestions and remarks, which helped improve the presentation and flow of the paper. Special thanks go also to the anonymous referee of a previous version, for pointing out a flaw, correcting which led to a significant improvement in the present results.

Cite AsGet BibTex

Jean-Daniel Boissonnat and Kunal Dutta. A Euclidean Embedding for Computing Persistent Homology with Gaussian Kernels. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ESA.2024.29

Abstract

Computing persistent homology of large datasets using Gaussian kernels is useful in the domains of topological data analysis and machine learning as shown by Phillips, Wang and Zheng [SoCG 2015]. However, unlike in the case of persistent homology computation using the Euclidean distance or the k-distance, using Gaussian kernels involves significantly higher overhead, as all distance computations are in terms of the Gaussian kernel distance which is computationally more expensive. Further, most algorithmic implementations (e.g. Gudhi, Ripser, etc.) are based on Euclidean distances, so the question of finding a Euclidean embedding - preferably low-dimensional - that preserves the persistent homology computed with Gaussian kernels, is quite important. We consider the Gaussian kernel power distance (GKPD) given by Phillips, Wang and Zheng. Given an n-point dataset and a relative error parameter {ε} ∈ (0,1], we show that the persistent homology of the {Čech } filtration of the dataset computed using the GKPD can be approximately preserved using O({ε}^{-2}log n) dimensions, under a high stable rank condition. Our results also extend to the Delaunay filtration and the (simpler) case of the weighted Rips filtrations constructed using the GKPD. Compared to the Euclidean embedding for the Gaussian kernel function in ∼ n dimensions, which uses the Cholesky decomposition of the matrix of the kernel function applied to all pairs of data points, our embedding may also be viewed as dimensionality reduction - reducing the dimensionality from n to ∼ log n dimensions. Our proof utilizes the embedding of Chen and Phillips [ALT 2017], based on the Random Fourier Functions of Rahimi and Recht [NeurIPS 2007], together with two novel ingredients. The first one is a new decomposition of the squared radii of {Čech } simplices computed using the GKPD, in terms of the pairwise GKPDs between the vertices, which we state and prove. The second is a new concentration inequality for sums of cosine functions of Gaussian random vectors, which we call Gaussian cosine chaoses. We believe these are of independent interest and will find other applications in future.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
  • Theory of computation → Random projections and metric embeddings
  • Theory of computation → Gaussian processes
Keywords
  • Persistent homology
  • Gaussian kernels
  • Random Fourier Features
  • Euclidean embedding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Thomas D. Ahle, Michael Kapralov, Jakob Bæk Tejs Knudsen, Rasmus Pagh, Ameya Velingker, David P. Woodruff, and Amir Zandieh. Oblivious sketching of high-degree polynomial kernels. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 141-160. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.9.
  2. Hirokazu Anai, Frédéric Chazal, Marc Glisse, Yuichi Ike, Hiroya Inakoshi, Raphaël Tinarrage, and Yuhei Umeda. DTM-Based Filtrations. In Gill Barequet and Yusu Wang, editors, 35th International Symposium on Computational Geometry, SoCG 2019, June 18-21, 2019, Portland, Oregon, USA, volume 129 of LIPIcs, pages 58:1-58:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.SoCG.2019.58.
  3. N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337-404, 1950. Google Scholar
  4. Shreya Arya, Jean-Daniel Boissonnat, Kunal Dutta, and Martin Lotz. Dimensionality reduction for k-distance applied to persistent homology. J. Appl. Comput. Topol., 5(4):671-691, 2021. URL: https://doi.org/10.1007/s41468-021-00079-x.
  5. Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. Random fourier features for kernel ridge regression: Approximation bounds and statistical guarantees. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML'17, pages 253-262. JMLR.org, 2017. Google Scholar
  6. Yair Bartal, Ben Recht, and Leonard J. Schulman. Dimensionality reduction: Beyond the Johnson-Lindenstrauss bound. In Dana Randall, editor, Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011, pages 868-887. SIAM, 2011. URL: https://doi.org/10.1137/1.9781611973082.68.
  7. Karol Borsuk. On the imbedding of systems of compacta in simplicial complexes. Fundamenta Mathematicae, 35(1):217-234, 1948. URL: http://eudml.org/doc/213158.
  8. Mickaël Buchet, Frédéric Chazal, Steve Y. Oudot, and Donald R. Sheehy. Efficient and robust persistent homology for measures. Comput. Geom., 58:70-96, 2016. URL: https://doi.org/10.1016/j.comgeo.2016.07.001.
  9. Mickaël Buchet, Tamal K. Dey, Jiayuan Wang, and Yusu Wang. Declutter and resample: Towards parameter free denoising. JoCG, 9(2):21-46, 2018. URL: https://doi.org/10.20382/jocg.v9i2a3.
  10. Moses Charikar and Paris Siminelakis. Hashing-based-estimators for kernel density in high dimensions. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 1032-1043, 2017. URL: https://doi.org/10.1109/FOCS.2017.99.
  11. Frédéric Chazal, David Cohen-Steiner, and Quentin Mérigot. Geometric inference for probability measures. Found. Comput. Math., 11(6):733-751, 2011. URL: https://doi.org/10.1007/s10208-011-9098-0.
  12. Frédéric Chazal and Steve Oudot. Towards persistence-based reconstruction in Euclidean spaces. In Monique Teillaud, editor, Proceedings of the 24th ACM Symposium on Computational Geometry, College Park, MD, USA, June 9-11, 2008, pages 232-241. ACM, 2008. URL: https://doi.org/10.1145/1377676.1377719.
  13. Di Chen and Jeff M. Phillips. Relative Error Embeddings of the Gaussian Kernel Distance. In International Conference on Algorithmic Learning Theory, ALT 2017, 15-17 October 2017, Kyoto University, Kyoto, Japan, pages 560-576, 2017. URL: http://proceedings.mlr.press/v76/chen17a.html.
  14. Herbert Edelsbrunner and John Harer. Computational Topology - an Introduction. American Mathematical Society, 2010. URL: http://www.ams.org/bookstore-getitem/item=MBK-69.
  15. Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplification. Discret. Comput. Geom., 28(4):511-533, 2002. URL: https://doi.org/10.1007/s00454-002-2885-2.
  16. Robert Ghrist. Elementary Applied Topology. Createspace, September 2014. URL: https://www2.math.upenn.edu/~ghrist/notes.html.
  17. C. Giraud. Introduction to High-Dimensional Statistics. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, 2014. Google Scholar
  18. Leonidas J. Guibas, Dmitriy Morozov, and Quentin Mérigot. Witnessed k-distance. Discret. Comput. Geom., 49(1):22-45, 2013. URL: https://doi.org/10.1007/s00454-012-9465-x.
  19. Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. Kernel methods in machine learning. The Annals of Statistics, 36(3):1171-1220, 2008. URL: https://doi.org/10.1214/009053607000000677.
  20. Sarang C. Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In Ferran Hurtado and Marc J. van Kreveld, editors, Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13-15, 2011, pages 47-56. ACM, 2011. Google Scholar
  21. F. Liu, X. Huang, Y. Chen, and J. K. Suykens. Random features for kernel approximation: A survey on algorithms, theory, and beyond. IEEE Transactions on Pattern Analysis & Machine Intelligence, 44(10):7128-7148, 2022. URL: https://doi.org/10.1109/TPAMI.2021.3097011.
  22. Martin Lotz. Persistent homology for low-complexity models. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 475(2230):20190081, 2019. URL: https://doi.org/10.1098/rspa.2019.0081.
  23. K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2):1-141, 2017. URL: https://doi.org/10.1561/2200000060.
  24. Cameron Musco and David P. Woodruff. Is input sparsity time possible for kernel low-rank approximation? In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, pages 4438-4448, Red Hook, NY, USA, 2017. Curran Associates Inc. Google Scholar
  25. John Nash. C1 isometric imbeddings. Annals of Mathematics, 60(3):383-396, 1954. Google Scholar
  26. Jeff M. Phillips. ε-samples for kernels. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 1622-1632. SIAM, 2013. Google Scholar
  27. Jeff M. Phillips and Wai Ming Tai. The GaussianSketch for Almost Relative Error Kernel Distance. In Jaroslaw Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, volume 176 of LIPIcs, pages 12:1-12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.12.
  28. Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands, volume 34 of LIPIcs, pages 857-871. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015. URL: https://doi.org/10.4230/LIPIcs.SOCG.2015.857.
  29. Rafał Latała. Estimates of Moments and Tails of Gaussian Chaoses. Ann. Probab., 34(6):2315-2331, November 2006. Google Scholar
  30. Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, pages 1177-1184. Curran Associates, Inc., 2007. URL: https://proceedings.neurips.cc/paper/2007/hash/013a006f03dbc5392effeb8f18fda755-Abstract.html.
  31. Donald R. Sheehy. The persistent homology of distance functions under random projection. In Siu-Wing Cheng and Olivier Devillers, editors, 30th Annual Symposium on Computational Geometry, SOCG'14, Kyoto, Japan, June 08 - 11, 2014, page 328. ACM, 2014. URL: https://doi.org/10.1145/2582112.2582126.
  32. Bharath K. Sriperumbudur, Kenji Fukumizu, and Gert R. G. Lanckriet. Universality, Characteristic Kernels and RKHS Embedding of Measures. J. Mach. Learn. Res., 12:2389-2410, July 2011. Google Scholar
  33. Michel Talagrand. Gaussian Chaos, pages 457-492. Springer International Publishing, Cham, 2021. URL: https://doi.org/10.1007/978-3-030-82595-9_15.
  34. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. URL: https://doi.org/10.1017/9781108231596.
  35. Siddharth Vishwanath, Kenji Fukumizu, Satoshi Kuriki, and Bharath Sriperumbudur. Robust persistence diagrams using reproducing kernels. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, Red Hook, NY, USA, 2020. Curran Associates Inc. Google Scholar
  36. Shusen Wang, Alex Gittens, and Michael W. Mahoney. Scalable Kernel K-Means Clustering with Nyström Approximation: Relative-Error Bounds. J. Mach. Learn. Res., 20(1):431-479, January 2019. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail