LIPIcs.SoCG.2023.10.pdf
- Filesize: 0.88 MB
- 16 pages
Designing coresets - small-space sketches of the data preserving cost of the solutions within (1± ε)-approximate factor - is an important research direction in the study of center-based k-clustering problems, such as k-means or k-median. Feldman and Langberg [STOC'11] have shown that for k-clustering of n points in general metrics, it is possible to obtain coresets whose size depends logarithmically in n. Moreover, such a dependency in n is inevitable in general metrics. A significant amount of recent work in the area is devoted to obtaining coresests whose sizes are independent of n for special metrics, like d-dimensional Euclidean space [Huang, Vishnoi, STOC'20], doubling metrics [Huang, Jiang, Li, Wu, FOCS'18], metrics of graphs of bounded treewidth [Baker, Braverman, Huang, Jiang, Krauthgamer, Wu, ICML’20], or graphs excluding a fixed minor [Braverman, Jiang, Krauthgamer, Wu, SODA’21]. In this paper, we provide the first constructions of coresets whose size does not depend on n for k-clustering in the metrics induced by geometric intersection graphs. For example, we obtain (k log²k)/ε^𝒪(1) size coresets for k-clustering in Euclidean-weighted unit-disk graphs (UDGs) and unit-square graphs (USGs). These constructions follow from a general theorem that identifies two canonical properties of a graph metric sufficient for obtaining coresets whose size is independent of n. The proof of our theorem builds on the recent work of Cohen-Addad, Saulpic, and Schwiegelshohn [STOC '21], which ensures small-sized coresets conditioned on the existence of an interesting set of centers, called centroid set. The main technical contribution of our work is the proof of the existence of such a small-sized centroid set for graphs that satisfy the two canonical properties. Loosely speaking, the metrics of geometric intersection graphs are "similar" to the Euclidean metrics for points that are close, and to the shortest path metrics of planar graphs for points that are far apart. The main technical challenge in constructing centroid sets of small sizes is in combining these two very different metrics. The new coreset construction helps to design the first (1+ε)-approximation for center-based clustering problems in UDGs and USGs, that is fixed-parameter tractable in k and ε (FPT-AS).
Feedback for Dagstuhl Publishing