FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles

Authors Luis Scoccola , Jose A. Perea



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2023.56.pdf
  • Filesize: 3.14 MB
  • 18 pages

Document Identifiers

Author Details

Luis Scoccola
  • Department of Mathematics, Northeastern University, Boston, MA, USA
Jose A. Perea
  • Department of Mathematics and Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA

Acknowledgements

The authors thank Matt Piekenbrock for various fruitful conversations.

Cite As Get BibTex

Luis Scoccola and Jose A. Perea. FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 56:1-56:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.SoCG.2023.56

Abstract

Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view and, as an application, we describe a dimensionality reduction algorithm based on topological inference for vector bundles. The algorithm takes as input a dataset together with an initial representation in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations obtained through local linear dimensionality reduction. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Algebraic topology
Keywords
  • topological inference
  • dimensionality reduction
  • vector bundle
  • cocycle

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eddie Aamari, Jisu Kim, Frédéric Chazal, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Estimating the reach of a manifold. Electronic journal of statistics, 13(1):1359-1399, 2019. Google Scholar
  2. Henry D. I. Abarbanel. Analysis of observed chaotic data. Institute for Nonlinear Science. Springer-Verlag, New York, 1996. URL: https://doi.org/10.1007/978-1-4612-0763-4.
  3. Roman M Balabin. Enthalpy difference between conformations of normal alkanes: Raman spectroscopy study of n-pentane and n-butane. The Journal of Physical Chemistry A, 113(6):1012-1019, 2009. Google Scholar
  4. Afonso S. Bandeira, Amit Singer, and Daniel A. Spielman. A Cheeger inequality for the graph connection Laplacian. SIAM J. Matrix Anal. Appl., 34(4):1611-1630, 2013. URL: https://doi.org/10.1137/120875338.
  5. Ulrich Bauer. Ripser: efficient computation of vietoris-rips persistence barcodes. Journal of Applied and Computational Topology, 2021. URL: https://doi.org/10.1007/s41468-021-00071-5.
  6. Hans-Joachim Baues and Davide L. Ferrario. K-theory of stratified vector bundles. K-Theory, 28(3):259-284, 2003. URL: https://doi.org/10.1023/A:1026215632002.
  7. Hans-Joachim Baues and Davide L. Ferrario. Stratified fibre bundles. Forum Math., 16(6):865-902, 2004. URL: https://doi.org/10.1515/form.2004.16.6.865.
  8. Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373-1396, 2003. Google Scholar
  9. Ingwer Borg and Patrick J. F. Groenen. Modern multidimensional scaling. Springer Series in Statistics. Springer, New York, second edition, 2005. Theory and applications. Google Scholar
  10. Matthew Brand. Charting a manifold. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15. MIT Press, 2002. URL: https://proceedings.neurips.cc/paper/2002/file/8929c70f8d710e412d38da624b21c3c8-Paper.pdf.
  11. Elie Joseph Cartan. Sur la possibilité de plonger un espace riemannien donné dans un espace euclidien. Annales de la Société Polonaise de Mathématique, 1928. Google Scholar
  12. Gisela D. Charó, Guillermo Artana, and Denisse Sciamarella. Topology of dynamical reconstructions from Lagrangian data. Phys. D, 405:132371, 12, 2020. URL: https://doi.org/10.1016/j.physd.2020.132371.
  13. Ronald R. Coifman and Stéphane Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5-30, 2006. URL: https://doi.org/10.1016/j.acha.2006.04.006.
  14. Vin de Silva, Dmitriy Morozov, and Mikael Vejdemo-Johansson. Persistent cohomology and circular coordinates. Discrete Comput. Geom., 45(4):737-759, 2011. URL: https://doi.org/10.1007/s00454-011-9344-x.
  15. David L. Donoho and Carrie Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591-5596, 2003. URL: https://doi.org/10.1073/pnas.1031596100.
  16. Ximena Fernández. Topology of fluid flows. https://github.com/ximenafernandez/topology_fluids, 2022.
  17. Jesus Garcia-Diaz, Rolando Menchaca-Mendez, Ricardo Menchaca-Mendez, Saúl Pomares Hernández, Julio César Pérez-Sansalvador, and Noureddine Lakouari. Approximation algorithms for the vertex k-center problem: Survey and experimental evaluation. IEEE Access, 7:109228-109245, 2019. URL: https://doi.org/10.1109/ACCESS.2019.2933875.
  18. Robert Ghrist. Barcodes: the persistent topology of data. Bull. Amer. Math. Soc. (N.S.), 45(1):61-75, 2008. URL: https://doi.org/10.1090/S0273-0979-07-01191-3.
  19. James M Haile. Molecular dynamics simulation: elementary methods. John Wiley & Sons, Inc., 1992. Google Scholar
  20. B. Halpern and C. Weaver. Inverting a cylinder through isometric immersions and isometric embeddings. Trans. Amer. Math. Soc., 230:41-70, 1977. URL: https://doi.org/10.2307/1997711.
  21. Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002. Google Scholar
  22. Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, second edition, 2013. Google Scholar
  23. S. Lafon and A.B. Lee. Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1393-1403, 2006. URL: https://doi.org/10.1109/TPAMI.2006.184.
  24. John Aldo Lee and Michel Verleysen. Nonlinear dimensionality reduction of data manifolds with essential loops. Neurocomput., 67:29-53, August 2005. URL: https://doi.org/10.1016/j.neucom.2004.11.042.
  25. Anna V Little, Jason Lee, Yoon-Mo Jung, and Mauro Maggioni. Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale svd. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pages 85-88. IEEE, 2009. Google Scholar
  26. Zixiang Luo, Chenyu Xu, Zhen Zhang, and Wenfei Jin. A topology-preserving dimensionality reduction method for single-cell rna-seq data using graph autoencoder. Scientific reports, 11(1):1-8, 2021. Google Scholar
  27. Shawn Martin, Aidan Thompson, Evangelos A Coutsias, and Jean-Paul Watson. Topology of cyclo-octane energy landscape. The journal of chemical physics, 132(23):234115, 2010. Google Scholar
  28. Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2018. URL: https://doi.org/10.48550/ARXIV.1802.03426.
  29. Ingrid Membrillo-Solis, Mariam Pirashvili, Lee Steinberg, Jacek Brodzki, and Jeremy G. Frey. Topology and geometry of molecular conformational spaces and energy landscapes, 2019. URL: https://doi.org/10.48550/ARXIV.1907.07770.
  30. John W. Milnor and James D. Stasheff. Characteristic classes. Annals of Mathematics Studies, No. 76. Princeton University Press, Princeton, N. J.; University of Tokyo Press, Tokyo, 1974. Google Scholar
  31. Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt. Topological autoencoders. In International conference on machine learning, pages 7045-7054. PMLR, 2020. Google Scholar
  32. Steve Y. Oudot. Persistence theory: from quiver representations to data analysis, volume 209 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2015. URL: https://doi.org/10.1090/surv/209.
  33. Rahul Paul and Stephan K Chalup. A study on validating non-linear dimensionality reduction using persistent homology. Pattern Recognition Letters, 100:160-166, 2017. Google Scholar
  34. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  35. Jose A. Perea. Multiscale projective coordinates via persistent cohomology of sparse filtrations. Discrete Comput. Geom., 59(1):175-225, 2018. URL: https://doi.org/10.1007/s00454-017-9927-2.
  36. Jose A. Perea. Topological times series analysis. Notices Amer. Math. Soc., 66(5):686-694, 2019. Google Scholar
  37. Jose A. Perea. Sparse circular coordinates via principal ℤ-bundles. In Topological data analysis - the Abel Symposium 2018, volume 15 of Abel Symp., pages 435-458. Springer, Cham, 2020. Google Scholar
  38. Luis Polanco and Jose A. Perea. Coordinatizing data with lens spaces and persistent cohomology. In Zachary Friggstad and Jean-Lou De Carufel, editors, Proceedings of the 31st Canadian Conference on Computational Geometry, CCCG 2019, August 8-10, 2019, University of Alberta, Edmonton, Alberta, Canada, pages 49-58, 2019. Google Scholar
  39. RDKit developers. RDKit: Open-source cheminformatics. http://www.rdkit.org, 2022.
  40. Bastian Rieck and Heike Leitte. Persistent homology for the evaluation of dimensionality reduction schemes. In Computer Graphics Forum, volume 34, pages 431-440. Wiley Online Library, 2015. Google Scholar
  41. Sam Roweis, Lawrence Saul, and Geoffrey E Hinton. Global coordination of local linear models. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14. MIT Press, 2001. URL: https://proceedings.neurips.cc/paper/2001/file/850af92f8d9903e7a4e0559a98ecc857-Paper.pdf.
  42. Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323-2326, 2000. Google Scholar
  43. Luis Scoccola, Hitesh Gakhar, Johnathan Bush, Nikolas Schonsheck, Tatum Rask, Ling Zhou, and Jose A. Perea. Toroidal coordinates: Decorrelating circular coordinates with lattice reduction, 2022. URL: https://doi.org/10.48550/ARXIV.2212.07201.
  44. Luis Scoccola and Jose A. Perea. FibeRed implementation. https://github.com/LuisScoccola/fibered.git, 2022.
  45. Luis Scoccola and Jose A. Perea. Fiberwise dimensionality reduction of topologically complex data with vector bundles, 2022. URL: https://doi.org/10.48550/ARXIV.2206.06513.
  46. Luis Scoccola and Jose A. Perea. Approximate and discrete Euclidean vector bundles. Forum of Mathematics, Sigma, 11, March 2023. URL: https://doi.org/10.1017/fms.2023.16.
  47. Shawn C. Shadden, Francois Lekien, and Jerrold E. Marsden. Definition and properties of Lagrangian coherent structures from finite-time Lyapunov exponents in two-dimensional aperiodic flows. Phys. D, 212(3-4):271-304, 2005. URL: https://doi.org/10.1016/j.physd.2005.10.007.
  48. A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal., 30(1):20-36, 2011. URL: https://doi.org/10.1016/j.acha.2010.02.001.
  49. Amit Singer and Hau-tieng Wu. Orientability and diffusion maps. Appl. Comput. Harmon. Anal., 31(1):44-58, 2011. URL: https://doi.org/10.1016/j.acha.2010.10.001.
  50. Gurjeet Singh, Facundo Mémoli, Gunnar E Carlsson, et al. Topological methods for the analysis of high dimensional data sets and 3d object recognition. PBG Eurographics, 2, 2007. Google Scholar
  51. Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web, WWW '16, pages 287-297, Republic and Canton of Geneva, CHE, 2016. International World Wide Web Conferences Steering Committee. URL: https://doi.org/10.1145/2872427.2883041.
  52. Yee Teh and Sam Roweis. Automatic alignment of local representations. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15. MIT Press, 2002. URL: https://proceedings.neurips.cc/paper/2002/file/3a1dd98341fafc1dfe9bcf36360e6b84-Paper.pdf.
  53. Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000. Google Scholar
  54. Chris Tralie, Tom Mease, and Jose Perea. Dreimac. https://github.com/ctralie/DREiMac, 2017.
  55. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008. Google Scholar
  56. Alexander Wagner, Elchanan Solomon, and Paul Bendich. Improving metric dimensionality reduction with distributed topology, 2021. URL: https://doi.org/10.48550/ARXIV.2106.07613.
  57. Hassler Whitney. The self-intersections of a smooth n-manifold in 2n-space. Ann. of Math. (2), 45:220-246, 1944. URL: https://doi.org/10.2307/1969265.
  58. Lin Yan, Yaodong Zhao, Paul Rosen, Carlos Scheidegger, and Bei Wang. Homology-preserving dimensionality reduction via manifold landmarking and tearing. arXiv preprint arXiv:1806.08460, 2018. Google Scholar
  59. Zhenyue Zhang and Hongyuan Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, pages 313-338, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail