DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.63

Connected k-Median with Disjoint and Non-Disjoint Clusters

Authors: Jan Eube, Kelin Luo, Dorian Reineccius, Heiko Röglin, and Melanie Schmidt

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

The connected k-median problem is a constrained clustering problem that combines distance-based k-clustering with connectivity information. The problem allows to input a metric space and an unweighted undirected connectivity graph that is completely unrelated to the metric space. The goal is to compute k centers and corresponding clusters such that each cluster forms a connected subgraph of G, and such that the k-median cost is minimized. The problem has applications in very different fields like geodesy (particularly districting), social network analysis (especially community detection), or bioinformatics. We study a version with overlapping clusters where points can be part of multiple clusters which is natural for the use case of community detection. This problem variant is Ω(log n)-hard to approximate, and our main result is an 𝒪(k² log n)-approximation algorithm for the problem. We complement it with an Ω(n^{1-ε})-hardness result for the case of disjoint clusters without overlap with general connectivity graphs, as well as an exact algorithm in this setting if the connectivity graph is a tree.

Cite as

Jan Eube, Kelin Luo, Dorian Reineccius, Heiko Röglin, and Melanie Schmidt. Connected k-Median with Disjoint and Non-Disjoint Clusters. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 63:1-63:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{eube_et_al:LIPIcs.ESA.2025.63,
  author =	{Eube, Jan and Luo, Kelin and Reineccius, Dorian and R\"{o}glin, Heiko and Schmidt, Melanie},
  title =	{{Connected k-Median with Disjoint and Non-Disjoint Clusters}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{63:1--63:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.63},
  URN =		{urn:nbn:de:0030-drops-245317},
  doi =		{10.4230/LIPIcs.ESA.2025.63},
  annote =	{Keywords: Clustering, Connectivity constraints, Approximation algorithms}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2023.50

Connected k-Center and k-Diameter Clustering

Authors: Lukas Drexler, Jan Eube, Kelin Luo, Heiko Röglin, Melanie Schmidt, and Julian Wargalla

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

Abstract

Motivated by an application from geodesy, we study the connected k-center problem and the connected k-diameter problem. These problems arise from the classical k-center and k-diameter problems by adding a side constraint. For the side constraint, we are given an undirected connectivity graph G on the input points, and a clustering is now only feasible if every cluster induces a connected subgraph in G. Usually in clustering problems one assumes that the clusters are pairwise disjoint. We study this case but additionally also the case that clusters are allowed to be non-disjoint. This can help to satisfy the connectivity constraints. Our main result is an O(1)-approximation algorithm for the disjoint connected k-center and k-diameter problem for Euclidean spaces of low dimension (constant d) and for metrics with constant doubling dimension. For general metrics, we get an O(log²k)-approximation. Our algorithms work by computing a non-disjoint connected clustering first and transforming it into a disjoint connected clustering. We complement these upper bounds by several upper and lower bounds for variations and special cases of the model.

Cite as

Lukas Drexler, Jan Eube, Kelin Luo, Heiko Röglin, Melanie Schmidt, and Julian Wargalla. Connected k-Center and k-Diameter Clustering. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 50:1-50:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{drexler_et_al:LIPIcs.ICALP.2023.50,
  author =	{Drexler, Lukas and Eube, Jan and Luo, Kelin and R\"{o}glin, Heiko and Schmidt, Melanie and Wargalla, Julian},
  title =	{{Connected k-Center and k-Diameter Clustering}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{50:1--50:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.50},
  URN =		{urn:nbn:de:0030-drops-181024},
  doi =		{10.4230/LIPIcs.ICALP.2023.50},
  annote =	{Keywords: Approximation algorithms, Clustering, Connectivity constraints}
}

Document

DOI: 10.4230/LIPIcs.ESA.2020.18

Noisy, Greedy and Not so Greedy k-Means++

Authors: Anup Bhattacharya, Jan Eube, Heiko Röglin, and Melanie Schmidt

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Abstract

The k-means++ algorithm due to Arthur and Vassilvitskii [David Arthur and Sergei Vassilvitskii, 2007] has become the most popular seeding method for Lloyd’s algorithm. It samples the first center uniformly at random from the data set and the other k-1 centers iteratively according to D²-sampling, i.e., the probability that a data point becomes the next center is proportional to its squared distance to the closest center chosen so far. k-means++ is known to achieve an approximation factor of 𝒪(log k) in expectation. Already in the original paper on k-means++, Arthur and Vassilvitskii suggested a variation called greedy k-means++ algorithm in which in each iteration multiple possible centers are sampled according to D²-sampling and only the one that decreases the objective the most is chosen as a center for that iteration. It is stated as an open question whether this also leads to an 𝒪(log k)-approximation (or even better). We show that this is not the case by presenting a family of instances on which greedy k-means++ yields only an Ω(𝓁⋅log k)-approximation in expectation where 𝓁 is the number of possible centers that are sampled in each iteration. Inspired by the negative results, we study a variation of greedy k-means++ which we call noisy k-means++ algorithm. In this variation only one center is sampled in every iteration but not exactly by D²-sampling. Instead in each iteration an adversary is allowed to change the probabilities arising from D²-sampling individually for each point by a factor between 1-ε₁ and 1+ε₂ for parameters ε₁ ∈ [0,1) and ε₂ ≥ 0. We prove that noisy k-means++ computes an 𝒪(log² k)-approximation in expectation. We use the analysis of noisy k-means++ to design a moderately greedy k-means++ algorithm.

Cite as

Anup Bhattacharya, Jan Eube, Heiko Röglin, and Melanie Schmidt. Noisy, Greedy and Not so Greedy k-Means++. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 18:1-18:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.ESA.2020.18,
  author =	{Bhattacharya, Anup and Eube, Jan and R\"{o}glin, Heiko and Schmidt, Melanie},
  title =	{{Noisy, Greedy and Not so Greedy k-Means++}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{18:1--18:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.18},
  URN =		{urn:nbn:de:0030-drops-128848},
  doi =		{10.4230/LIPIcs.ESA.2020.18},
  annote =	{Keywords: k-means++, greedy, adaptive sampling}
}

Search Results

Documents authored by Eube, Jan

Connected k-Median with Disjoint and Non-Disjoint Clusters

Abstract

Cite as

Connected k-Center and k-Diameter Clustering

Abstract

Cite as

Noisy, Greedy and Not so Greedy k-Means++

Abstract

Cite as

Thanks for your feedback!

Could not send message