eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2023-08-30
28:1
28:14
10.4230/LIPIcs.ESA.2023.28
article
On k-Means for Segments and Polylines
Cabello, Sergio
1
2
https://orcid.org/0000-0002-3183-4126
Giannopoulos, Panos
3
https://orcid.org/0000-0002-6261-1961
Faculty of Mathematics and Physics, University of Ljubljana, Slovenia
Institute of Mathematics, Physics and Mechanics, Ljubljana, Slovenia
Department of Computer Science, City, University of London, UK
We study the problem of k-means clustering in the space of straight-line segments in ℝ² under the Hausdorff distance. For this problem, we give a (1+ε)-approximation algorithm that, for an input of n segments, for any fixed k, and with constant success probability, runs in time O(n + ε^{-O(k)} + ε^{-O(k)} ⋅ log^O(k) (ε^{-1})). The algorithm has two main ingredients. Firstly, we express the k-means objective in our metric space as a sum of algebraic functions and use the optimization technique of Vigneron [Antoine Vigneron, 2014] to approximate its minimum. Secondly, we reduce the input size by computing a small size coreset using the sensitivity-based sampling framework by Feldman and Langberg [Dan Feldman and Michael Langberg, 2011; Feldman et al., 2020]. Our results can be extended to polylines of constant complexity with a running time of O(n + ε^{-O(k)}).
https://drops.dagstuhl.de/storage/00lipics/lipics-vol274-esa2023/LIPIcs.ESA.2023.28/LIPIcs.ESA.2023.28.pdf
k-means clustering
segments
polylines
Hausdorff distance
Fréchet mean