DROPS

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.4

Hardness of Approximation for Euclidean k-Median

Authors: Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Abstract

The Euclidean k-median problem is defined in the following manner: given a set 𝒳 of n points in d-dimensional Euclidean space ℝ^d, and an integer k, find a set C ⊂ ℝ^d of k points (called centers) such that the cost function Φ(C,𝒳) ≡ ∑_{x ∈ 𝒳} min_{c ∈ C} ‖x-c‖₂ is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output β k centers (for constant β > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any β < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of β < 1.28, again assuming UGC.

Cite as

Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of Approximation for Euclidean k-Median. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.APPROX/RANDOM.2021.4,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh},
  title =	{{Hardness of Approximation for Euclidean k-Median}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  URN =		{urn:nbn:de:0030-drops-146979},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  annote =	{Keywords: Hardness of approximation, bicriteria approximation, approximation algorithms, k-median, k-means}
}

Document

DOI: 10.4230/LIPIcs.IPEC.2020.14

FPT Approximation for Constrained Metric k-Median/Means

Authors: Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 180, 15th International Symposium on Parameterized and Exact Computation (IPEC 2020)

Abstract

The Metric k-median problem over a metric space (𝒳, d) is defined as follows: given a set L ⊆ 𝒳 of facility locations and a set C ⊆ 𝒳 of clients, open a set F ⊆ L of k facilities such that the total service cost, defined as Φ(F, C) := ∑_{x ∈ C} min_{f ∈ F} d(x, f), is minimised. The metric k-means problem is defined similarly using squared distances (i.e., d²(., .) instead of d(., .)). In many applications there are additional constraints that any solution needs to satisfy. For example, to balance the load among the facilities in resource allocation problems, a capacity u is imposed on every facility. That is, no more than u clients can be assigned to any facility. This problem is known as the capacitated k-means/k-median problem. Likewise, various other applications have different constraints, which give rise to different constrained versions of the problem such as r-gather, fault-tolerant, outlier k-means/k-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. Moreover, the unconstrained problem itself is known [Marek Adamczyk et al., 2019] to be W[2]-hard when parameterized by k. We give FPT algorithms with constant approximation guarantee for a range of constrained k-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu [Ding and Xu, 2015] that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a (3+ε)-approximation and (9+ε)-approximation for the constrained versions of the k-median and k-means problem respectively in FPT time. In many practical settings of the k-median/means problem, one is allowed to open a facility at any client location, i.e., C ⊆ L. For this special case, our algorithm gives a (2+ε)-approximation and (4+ε)-approximation for the constrained versions of k-median and k-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm. In particular, here are some of the main highlights of this work: 1) For the uniform capacitated k-median/means problems our results matches previously known results of Addad et al. [Vincent Cohen-Addad and Jason Li, 2019]. 2) For the r-gather k-median/means problem (clustering with lower bound on the size of clusters), our FPT approximation bounds are better than what was previously known. 3) Our approximation bounds for the fault-tolerant, outlier, and uncertain versions is better than all previously known results, albeit in FPT time. 4) For certain constrained settings such as chromatic, l-diversity, and semi-supervised k-median/means, we obtain the first constant factor approximation algorithms to the best of our knowledge. 5) Since our algorithms are based on a simple sampling based approach, we also obtain constant-pass log-space streaming algorithms for most of the above-mentioned problems.

Cite as

Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. FPT Approximation for Constrained Metric k-Median/Means. In 15th International Symposium on Parameterized and Exact Computation (IPEC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 180, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{goyal_et_al:LIPIcs.IPEC.2020.14,
  author =	{Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{FPT Approximation for Constrained Metric k-Median/Means}},
  booktitle =	{15th International Symposium on Parameterized and Exact Computation (IPEC 2020)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-172-6},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{180},
  editor =	{Cao, Yixin and Pilipczuk, Marcin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2020.14},
  URN =		{urn:nbn:de:0030-drops-133170},
  doi =		{10.4230/LIPIcs.IPEC.2020.14},
  annote =	{Keywords: k-means, k-median, approximation algorithms, parameterised algorithms}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2020.13

On Sampling Based Algorithms for k-Means

Authors: Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)

Abstract

We generalise the results of Bhattacharya et al. [Bhattacharya et al., 2018] for the list-k-means problem defined as - for a (unknown) partition X₁, ..., X_k of the dataset X ⊆ ℝ^d, find a list of k-center-sets (each element in the list is a set of k centers) such that at least one of k-center-sets {c₁, ..., c_k} in the list gives an (1+ε)-approximation with respect to the cost function min_{permutation π} [∑_{i = 1}^{k} ∑_{x ∈ X_i} ||x - c_{π(i)}||²]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to {PTAS} for various versions of the latter. The algorithm for the list-k-means problem by Bhattacharya et al. is a D²-sampling based algorithm that runs in k iterations. Making use of a constant factor solution for the (classical or unconstrained) k-means problem, we generalise the algorithm of Bhattacharya et al. in two ways - (i) for any fixed set X_{j₁}, ..., X_{j_t} of t ≤ k clusters, the algorithm produces a list of (k/(ε))^{O(t/(ε))} t-center sets such that (w.h.p.) at least one of them is good for X_{j₁}, ..., X_{j_t}, and (ii) the algorithm runs in a single iteration. Following are the consequences of our generalisations: 1) Faster PTAS under stability and a parameterised reduction: Property (i) of our generalisation is useful in scenarios where finding good centers becomes easier once good centers for a few "bad" clusters have been chosen. One such case is clustering under stability of Awasthi et al. [Awasthi et al., 2010] where the number of such bad clusters is a constant. Using property (i), we significantly improve the running time of their algorithm from O(dn³) (k log{n})^{poly(1/(β), 1/(ε))} to O (dn³ (k/(ε)) ^{O(1/βε²)}). Another application is a parameterised reduction from the outlier version of k-means to the classical one where the bad clusters are the outliers. 2) Streaming algorithms: The sampling algorithm running in a single iteration (i.e., property (ii)) allows us to design a constant-pass, logspace streaming algorithm for the list-k-means problem. This can be converted to a constant-pass, logspace streaming PTAS for various constrained versions of the k-means problem. In particular, this gives a 3-pass, polylog-space streaming PTAS for the constrained binary k-means problem which in turn gives a 4-pass, polylog-space streaming PTAS for the generalised binary 𝓁₀-rank-r approximation problem. This is the first constant pass, polylog-space streaming algorithm for either of the two problems. Coreset based techniques, which is another approach for designing streaming algorithms in general, is not known to work for the constrained binary k-means problem to the best of our knowledge.

Cite as

Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.FSTTCS.2020.13,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{On Sampling Based Algorithms for k-Means}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.13},
  URN =		{urn:nbn:de:0030-drops-132549},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.13},
  annote =	{Keywords: k-means, low rank approximation}
}

3 Search Results for "Goyal, Dishant"

Hardness of Approximation for Euclidean k-Median

Abstract

Cite as

FPT Approximation for Constrained Metric k-Median/Means

Abstract

Cite as

On Sampling Based Algorithms for k-Means

Abstract

Cite as

Thanks for your feedback!

Could not send message