Fréchet Mean and p-Mean on the Unit Circle: Decidability, Algorithm, and Applications to Clustering on the Flat Torus

Authors Frédéric Cazals, Bernard Delmas, Timothee O'Donnell



PDF
Thumbnail PDF

File

LIPIcs.SEA.2021.15.pdf
  • Filesize: 1.1 MB
  • 16 pages

Document Identifiers

Author Details

Frédéric Cazals
  • Université Côte d'Azur, France
  • Inria, Sophia Antipolis, France
Bernard Delmas
  • INRAe, Jouy-en-Josas, France
Timothee O'Donnell
  • Université Côte d'Azur, France
  • Inria, Sophia Antipolis, France

Acknowledgements

Chee Yap and Sylvain Pion are acknowledged for discussions on irrational number theory and number types, respectively.

Cite AsGet BibTex

Frédéric Cazals, Bernard Delmas, and Timothee O'Donnell. Fréchet Mean and p-Mean on the Unit Circle: Decidability, Algorithm, and Applications to Clustering on the Flat Torus. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 15:1-15:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.SEA.2021.15

Abstract

The center of mass of a point set lying on a manifold generalizes the celebrated Euclidean centroid, and is ubiquitous in statistical analysis in non Euclidean spaces. In this work, we give a complete characterization of the weighted p-mean of a finite set of angular values on S¹, based on a decomposition of S¹ such that the functional of interest has at most one local minimum per cell. This characterization is used to show that the problem is decidable for rational angular values -a consequence of Lindemann’s theorem on the transcendence of π, and to develop an effective algorithm parameterized by exact predicates. A robust implementation of this algorithm based on multi-precision interval arithmetic is also presented, and is shown to be effective for large values of n and p. We use it as building block to implement the k-means and k-means++ clustering algorithms on the flat torus, with applications to clustering protein molecular conformations. These algorithms are available in the Structural Bioinformatics Library (http://sbl.inria.fr). Our derivations are of interest in two respects. First, efficient p-mean calculations are relevant to develop principal components analysis on the flat torus encoding angular spaces-a particularly important case to describe molecular conformations. Second, our two-stage strategy stresses the interest of combinatorial methods for p-means, also emphasizing the role of numerical issues.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • Frechét mean
  • p-mean
  • circular statistics
  • decidability
  • robustness
  • multi-precision
  • angular spaces
  • flat torus
  • clustering
  • molecular conformations

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Cgal, Computational Geometry Algorithms Library. http://www.cgal.org. Google Scholar
  2. F. Allen and O. Johnson. Automated conformational analysis from crystallographic data. 4. statistical descriptors for a distribution of torsion angles. Acta Crystallographica Section B: Structural Science, 47(1):62-67, 1991. Google Scholar
  3. M. Arnaudon and L. Miclo. Means in complete manifolds: uniqueness and approximation. ESAIM: Probability and Statistics, 18:185-206, 2014. Google Scholar
  4. M. Arnaudon and L. Miclo. A stochastic algorithm finding p-means on the circle. Bernoulli, 22(4):2237-2300, 2016. Google Scholar
  5. D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In ACM-SODA, page 1035. Society for Industrial and Applied Mathematics, 2007. Google Scholar
  6. F. Cazals and T. Dreyfus. The Structural Bioinformatics Library: modeling in biomolecular science and beyond. Bioinformatics, 7(33):1-8, 2017. URL: https://doi.org/10.1093/bioinformatics/btw752.
  7. F. Cazals, D. Mazauric, R. Tetley, and R. Watrigant. Comparing two clusterings using matchings between clusters of clusters. ACM J. of Experimental Algorithms, 24(1):1-42, 2019. URL: https://doi.org/10.1145/3345951.
  8. E-C. Chang, S.W. Choi, D.Y. Kwon, H. Park, and C. Yap. Shortest path amidst disc obstacles is computable. International Journal of Computational Geometry Applications, 16(05n06):567-590, 2006. Google Scholar
  9. B. Charlier. Necessary and sufficient condition for the existence of a Fréchet mean on the circle. ESAIM: Probability and Statistics, 17:635-649, 2013. Google Scholar
  10. L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, and P. Zimmermann. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Transactions on Mathematical Software (TOMS), 33(2):13, 2007. Google Scholar
  11. M. Fréchet. Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l'institut Henri Poincaré, 10(4):215-310, 1948. Google Scholar
  12. K. Grove and H. Karcher. How to conjugatec 1-close group actions. Mathematische Zeitschrift, 132(1):11-20, 1973. Google Scholar
  13. T. Hotz and s. Huckemann. Intrinsic means on the circle: Uniqueness, locus and asymptotics. Annals of the Institute of Statistical Mathematics, 67(1):177-193, 2015. Google Scholar
  14. S.R. Jammalamadaka and A. SenGupta. Topics in Circular Statistics. World Scientific, 2001. Google Scholar
  15. V. Karamcheti, C. Li, I. Pechtchanski, and C. Yap. A core library for robust numeric and geometric computation. In Proceedings of the fifteenth annual symposium on Computational geometry, pages 351-359. ACM, 1999. Google Scholar
  16. L. Kettner, K. Mehlhorn, S. Pion, S. Schirra, and C. Yap. Classroom examples of robustness problems in geometric computations. Computational Geometry, 40(1):61-78, 2008. Google Scholar
  17. A. Kobel, F. Rouillier, and M. Sagraloff. Computing real roots of real polynomials... and now for real! In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, pages 303-310. ACM, 2016. Google Scholar
  18. C. Li, S. Pion, and C. Yap. Recent progress in exact geometric computation. The Journal of Logic and Algebraic Programming, 64(1):85-111, 2005. Google Scholar
  19. M. MacArthur and J. Thornton. Conformational analysis of protein structures derived from nmr data. Proteins: Structure, Function, and Bioinformatics, 17(3):232-251, 1993. Google Scholar
  20. K. Mardia and P. Jupp. Directional statistics, volume 494. John Wiley and Sons, 2009. Google Scholar
  21. J.-M. Muller, N. Brunie, F. de Dinechin, C. Jeannerod, M. Joldes, V. Lefèvre, G. Melquiond, N. Revol, and S. Torres. Handbook of floating-point arithmetic. Springer, 2018. Google Scholar
  22. A. Ng. Clustering with the k-means algorithm. Machine Learning, 2012. Google Scholar
  23. X. Pennec. Barycentric subspace analysis on manifolds. The Annals of Statistics, 46(6A):2711-2746, 2018. Google Scholar
  24. F. Preparata and M. Shamos. Computational geometry: an introduction. Springer Science and Business Media, 1985. Google Scholar
  25. Maxim V Shapovalov and Roland L Dunbrack Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19(6):844-858, 2011. Google Scholar
  26. D. Ting, G. Wang, M. Shapovalov, R. Mitra, M.I. Jordan, and R. Dunbrack. Neighbor-dependent ramachandran probability distributions of amino acids developed from a hierarchical dirichlet process model. PLoS Comput Biol, 6(4):e1000763, 2010. Google Scholar
  27. C. Yap and T. Dubé. The exact computation paradigm. In Computing in Euclidean Geometry, pages 452-492. World Scientific, 1995. Google Scholar
  28. J. Yu, C. Yap, Z. Du, S. Pion, and Hervé H. Brönnimann. The design of core 2: A library for exact numeric computation in geometry and algebra. In International Congress on Mathematical Software, pages 121-141. Springer, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail