Robust Anisotropic Power-Functions-Based Filtrations for Clustering

Author Claire Brécheteau



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2020.23.pdf
  • Filesize: 1.12 MB
  • 15 pages

Document Identifiers

Author Details

Claire Brécheteau
  • Laboratoire de Mathématiques Jean Leray & École Centrale de Nantes, France

Acknowledgements

I am extremely grateful to Samuel Tapie, for his suggestion to use tangency of ellipsoids at their first intersection point, to derive the expression of their intersection radius.

Cite AsGet BibTex

Claire Brécheteau. Robust Anisotropic Power-Functions-Based Filtrations for Clustering. In 36th International Symposium on Computational Geometry (SoCG 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 164, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SoCG.2020.23

Abstract

We consider robust power-distance functions that approximate the distance function to a compact set, from a noisy sample. We pay particular interest to robust power-distance functions that are anisotropic, in the sense that their sublevel sets are unions of ellipsoids, and not necessarily unions of balls. Using persistence homology on such power-distance functions provides robust clustering schemes. We investigate such clustering schemes and compare the different procedures on synthetic and real datasets. In particular, we enhance the good performance of the anisotropic method for some cases for which classical methods fail.

Subject Classification

ACM Subject Classification
  • Theory of computation → Unsupervised learning and clustering
Keywords
  • Power functions
  • Filtrations
  • Hierarchical Clustering
  • Ellipsoids

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hirokazu Anai, Frédéric Chazal, Marc Glisse, Yuichi Ike, Hiroya Inakoshi, Raphaël Tinarrage, and Yuhei Umeda. DTM-based filtrations. In 35th International Symposium on Computational Geometry, volume 129 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 58, 15. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019. Google Scholar
  2. Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. J. Mach. Learn. Res., 6:1705-1749, December 2005. URL: http://dl.acm.org/citation.cfm?id=1046920.1194902.
  3. Gregory Bell, Austin Lawson, Joshua Martin, James Rudzinski, and Clifford Smyth. Weighted persistent homology. Involve, 12(5):823-837, 2019. URL: https://doi.org/10.2140/involve.2019.12.823.
  4. Claire Brécheteau. Robust shape inference from a sparse approximation of the gaussian trimmed loglikelihood. Unpublished, 2018. Google Scholar
  5. Claire Brécheteau, Aurélie Fischer, and Clément Levrard. Robust bregman clustering. In revision, 2018. Google Scholar
  6. Claire Brécheteau and Clément Levrard. A k-points-based distance for robust geometric inference. To appear in Bernoulli, 2017. Google Scholar
  7. Mickaël Buchet, Frédéric Chazal, Steve Y. Oudot, and Donald R. Sheehy. Efficient and robust persistent homology for measures. Comput. Geom., 58:70-96, 2016. URL: https://doi.org/10.1016/j.comgeo.2016.07.001.
  8. Mickaël Buchet, Tamal K. Dey, Jiayuan Wang, and Yusu Wang. Declutter and resample: towards parameter free denoising. J. Comput. Geom., 9(2):21-46, 2018. Google Scholar
  9. Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry, SCG '09, pages 237-246, New York, NY, USA, 2009. ACM. URL: https://doi.org/10.1145/1542362.1542407.
  10. Frédéric Chazal, David Cohen-Steiner, and Quentin Mérigot. Geometric Inference for Measures based on Distance Functions. Foundations of Computational Mathematics, 11(6):733-751, 2011. URL: https://doi.org/10.1007/s10208-011-9098-0.
  11. Frédéric Chazal, Vin de Silva, Marc Glisse, and Steve Oudot. The structure and stability of persistence modules. SpringerBriefs in Mathematics. Springer, [Cham], 2016. URL: https://doi.org/10.1007/978-3-319-42545-0.
  12. Frédéric Chazal, Leonidas J. Guibas, Steve Y. Oudot, and Primoz Skraba. Persistence-based clustering in Riemannian manifolds. J. ACM, 60(6):Art. 41, 38, 2013. URL: https://doi.org/10.1145/2535927.
  13. David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. Discrete Comput. Geom., 37(1):103-120, 2007. URL: https://doi.org/10.1007/s00454-006-1276-5.
  14. J. A. Cuesta-Albertos, A. Gordaliza, and C. Matrán. Trimmed k-means: an attempt to robustify quantizers. Ann. Statist., 25(2):553-576, 1997. URL: https://doi.org/10.1214/aos/1031833664.
  15. Vin de Silva and Robert Ghrist. Coverage in sensor networks via persistent homology. Algebr. Geom. Topol., 7:339-358, 2007. URL: https://doi.org/10.2140/agt.2007.7.339.
  16. Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplification. Discrete Comput. Geom., 28(4):511-533, 2002. Discrete and computational geometry and graph drawing (Columbia, SC, 2001). URL: https://doi.org/10.1007/s00454-002-2885-2.
  17. Heinrich Fritz, Luis A. Garcia-Escudero, and Agustin Mayo-Iscar. tclust: An R package for a trimming approach to cluster analysis. Journal of Statistical Software, 47(12):1-26, 2012. URL: http://www.jstatsoft.org/v47/i12/.
  18. Leonidas Guibas, Dmitriy Morozov, and Quentin Mérigot. Witnessed k-distance. Discrete Comput. Geom., 49(1):22-45, 2013. URL: https://doi.org/10.1007/s00454-012-9465-x.
  19. Michael Hahsler, Matthew Piekenbrock, and Derek Doran. dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1):1-30, 2019. URL: https://doi.org/10.18637/jss.v091.i01.
  20. Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002. Google Scholar
  21. Alexandros Karatzoglou, Alex Smola, Kurt Hornik, and Achim Zeileis. kernlab - an S4 package for kernel methods in R. Journal of Statistical Software, 11(9):1-20, 2004. URL: http://www.jstatsoft.org/v11/i09/.
  22. Stuart P. Lloyd. Least squares quantization in PCM. IEEE Trans. Inform. Theory, 28(2):129-137, 1982. URL: https://doi.org/10.1109/TIT.1982.1056489.
  23. Alexander A. Lubischew. On the use of discriminant functions in taxonomy. Biometrics, pages 455-477, 1962. Google Scholar
  24. J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281-297, Berkeley, Calif., 1967. University of California Press. URL: https://projecteuclid.org/euclid.bsmsp/1200512992.
  25. Stephen B Pope. Algorithms for ellipsoids. Technical Report FDA-08-01, Sibley School of Mechanical & Aerospace Engineering, Cornell University Ithaca, New York 14853, 2008. Google Scholar
  26. P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987. Google Scholar
  27. Ulrike von Luxburg. A tutorial on spectral clustering. Stat. Comput., 17(4):395-416, 2007. URL: https://doi.org/10.1007/s11222-007-9033-z.
  28. Wenping Wang, Jiaye Wang, and Myung-Soo Kim. An algebraic condition for the separation of two ellipsoids. Comput. Aided Geom. Design, 18(6):531-539, 2001. URL: https://doi.org/10.1016/S0167-8396(01)00049-8.
  29. Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja. tourr: An R package for exploring multivariate data with projections. Journal of Statistical Software, 40(2):1-18, 2011. URL: http://www.jstatsoft.org/v40/i02/.