DROPS

Document

DOI: 10.4230/LIPIcs.CCC.2024.16

A Strong Direct Sum Theorem for Distributional Query Complexity

Authors: Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)

Abstract

Consider the expected query complexity of computing the k-fold direct product f^{⊗ k} of a function f to error ε with respect to a distribution μ^k. One strategy is to sequentially compute each of the k copies to error ε/k with respect to μ and apply the union bound. We prove a strong direct sum theorem showing that this naive strategy is essentially optimal. In particular, computing a direct product necessitates a blowup in both query complexity and error. Strong direct sum theorems contrast with results that only show a blowup in query complexity or error but not both. There has been a long line of such results for distributional query complexity, dating back to (Impagliazzo, Raz, Wigderson 1994) and (Nisan, Rudich, Saks 1994), but a strong direct sum theorem that holds for all functions in the standard query model had been elusive. A key idea in our work is the first use of the Hardcore Theorem (Impagliazzo 1995) in the context of query complexity. We prove a new resilience lemma that accompanies it, showing that the hardcore of f^{⊗k} is likely to remain dense under arbitrary partitions of the input space.

Cite as

Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan. A Strong Direct Sum Theorem for Distributional Query Complexity. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 16:1-16:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.CCC.2024.16,
  author =	{Blanc, Guy and Koch, Caleb and Strassle, Carmen and Tan, Li-Yang},
  title =	{{A Strong Direct Sum Theorem for Distributional Query Complexity}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{16:1--16:30},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.16},
  URN =		{urn:nbn:de:0030-drops-204123},
  doi =		{10.4230/LIPIcs.CCC.2024.16},
  annote =	{Keywords: Query complexity, direct product theorem, hardcore theorem}
}

Document

DOI: 10.4230/LIPIcs.CCC.2024.18

Pseudorandomness, Symmetry, Smoothing: I

Authors: Harm Derksen, Peter Ivanov, Chin Ho Lee, and Emanuele Viola

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)

Abstract

We prove several new results about bounded uniform and small-bias distributions. A main message is that, small-bias, even perturbed with noise, does not fool several classes of tests better than bounded uniformity. We prove this for threshold tests, small-space algorithms, and small-depth circuits. In particular, we obtain small-bias distributions that - achieve an optimal lower bound on their statistical distance to any bounded-uniform distribution. This closes a line of research initiated by Alon, Goldreich, and Mansour in 2003, and improves on a result by O'Donnell and Zhao. - have heavier tail mass than the uniform distribution. This answers a question posed by several researchers including Bun and Steinke. - rule out a popular paradigm for constructing pseudorandom generators, originating in a 1989 work by Ajtai and Wigderson. This again answers a question raised by several researchers. For branching programs, our result matches a bound by Forbes and Kelley. Our small-bias distributions above are symmetric. We show that the xor of any two symmetric small-bias distributions fools any bounded function. Hence our examples cannot be extended to the xor of two small-bias distributions, another popular paradigm whose power remains unknown. We also generalize and simplify the proof of a result of Bazzi.

Cite as

Harm Derksen, Peter Ivanov, Chin Ho Lee, and Emanuele Viola. Pseudorandomness, Symmetry, Smoothing: I. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 18:1-18:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{derksen_et_al:LIPIcs.CCC.2024.18,
  author =	{Derksen, Harm and Ivanov, Peter and Lee, Chin Ho and Viola, Emanuele},
  title =	{{Pseudorandomness, Symmetry, Smoothing: I}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{18:1--18:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.18},
  URN =		{urn:nbn:de:0030-drops-204144},
  doi =		{10.4230/LIPIcs.CCC.2024.18},
  annote =	{Keywords: pseudorandomness, k-wise uniform distributions, small-bias distributions, noise, symmetric tests, thresholds, Krawtchouk polynomials}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.6

Parameterized Approximation For Robust Clustering in Discrete Geometric Spaces

Authors: Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, and Joachim Spoerhase

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider the well-studied Robust (k,z)-Clustering problem, which generalizes the classic k-Median, k-Means, and k-Center problems and arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness [Abbasi, Bhaskara, Venkatasubramanian, 2021 & Ghadiri, Samadi, Vempala, 2022]. Given a constant z ≥ 1, the input to Robust (k,z)-Clustering is a set P of n points in a metric space (M,δ), a weight function w: P → ℝ_{≥ 0} and a positive integer k. Further, each point belongs to one (or more) of the m many different groups S_1,S_2,…,S_m ⊆ P. Our goal is to find a set X of k centers such that max_{i ∈ [m]} ∑_{p ∈ S_i} w(p) δ(p,X)^z is minimized. Complementing recent work on this problem, we give a comprehensive understanding of the parameterized approximability of the problem in geometric spaces where the parameter is the number k of centers. We prove the following results: [(i)] 1) For a universal constant η₀ > 0.0006, we devise a 3^z(1-η₀)-factor FPT approximation algorithm for Robust (k,z)-Clustering in discrete high-dimensional Euclidean spaces where the set of potential centers is finite. This shows that the lower bound of 3^z for general metrics [Goyal, Jaiswal, Inf. Proc. Letters, 2023] no longer holds when the metric has geometric structure. 2) We show that Robust (k,z)-Clustering in discrete Euclidean spaces is (√{3/2}- o(1))-hard to approximate for FPT algorithms, even if we consider the special case k-Center in logarithmic dimensions. This rules out a (1+ε)-approximation algorithm running in time f(k,ε)poly(m,n) (also called efficient parameterized approximation scheme or EPAS), giving a striking contrast with the recent EPAS for the continuous setting where centers can be placed anywhere in the space [Abbasi et al., FOCS'23]. 3) However, we obtain an EPAS for Robust (k,z)-Clustering in discrete Euclidean spaces when the dimension is sublogarithmic (for the discrete problem, earlier work [Abbasi et al., FOCS'23] provides an EPAS only in dimension o(log log n)). Our EPAS works also for metrics of sub-logarithmic doubling dimension.

Cite as

Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, and Joachim Spoerhase. Parameterized Approximation For Robust Clustering in Discrete Geometric Spaces. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 6:1-6:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{abbasi_et_al:LIPIcs.ICALP.2024.6,
  author =	{Abbasi, Fateme and Banerjee, Sandip and Byrka, Jaros{\l}aw and Chalermsook, Parinya and Gadekar, Ameet and Khodamoradi, Kamyar and Marx, D\'{a}niel and Sharma, Roohani and Spoerhase, Joachim},
  title =	{{Parameterized Approximation For Robust Clustering in Discrete Geometric Spaces}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{6:1--6:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.6},
  URN =		{urn:nbn:de:0030-drops-201494},
  doi =		{10.4230/LIPIcs.ICALP.2024.6},
  annote =	{Keywords: Clustering, approximation algorithms, parameterized complexity}
}

@InProceedings{abbasi_et_al:LIPIcs.ICALP.2024.6,
  author =	{Abbasi, Fateme and Banerjee, Sandip and Byrka, Jaros{\l}aw and Chalermsook, Parinya and Gadekar, Ameet and Khodamoradi, Kamyar and Marx, D\'{a}niel and Sharma, Roohani and Spoerhase, Joachim},
  title =	{{Parameterized Approximation For Robust Clustering in Discrete Geometric Spaces}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{6:1--6:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.6},
  URN =		{urn:nbn:de:0030-drops-201494},
  doi =		{10.4230/LIPIcs.ICALP.2024.6},
  annote =	{Keywords: Clustering, approximation algorithms, parameterized complexity}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2024.65

FPT Approximation for Capacitated Sum of Radii

Authors: Ragesh Jaiswal, Amit Kumar, and Jatin Yadav

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We consider the capacitated clustering problem in general metric spaces where the goal is to identify k clusters and minimize the sum of the radii of the clusters (we call this the Capacitated k-sumRadii problem). We are interested in fixed-parameter tractable (FPT) approximation algorithms where the running time is of the form f(k) ⋅ poly(n), where f(k) can be an exponential function of k and n is the number of points in the input. In the uniform capacity case, Bandyapadhyay et al. recently gave a 4-approximation algorithm for this problem. Our first result improves this to an FPT 3-approximation and extends to a constant factor approximation for any L_p norm of the cluster radii. In the general capacities version, Bandyapadhyay et al. gave an FPT 15-approximation algorithm. We extend their framework to give an FPT (4 + √13)-approximation algorithm for this problem. Our framework relies on a novel idea of identifying approximations to optimal clusters by carefully pruning points from an initial candidate set of points. This is in contrast to prior results that rely on guessing suitable points and building balls of appropriate radii around them. On the hardness front, we show that assuming the Exponential Time Hypothesis, there is a constant c > 1 such that any c-approximation algorithm for the non-uniform capacity version of this problem requires running time 2^Ω(k/polylog(k)).

Cite as

Ragesh Jaiswal, Amit Kumar, and Jatin Yadav. FPT Approximation for Capacitated Sum of Radii. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 65:1-65:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jaiswal_et_al:LIPIcs.ITCS.2024.65,
  author =	{Jaiswal, Ragesh and Kumar, Amit and Yadav, Jatin},
  title =	{{FPT Approximation for Capacitated Sum of Radii}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{65:1--65:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.65},
  URN =		{urn:nbn:de:0030-drops-195937},
  doi =		{10.4230/LIPIcs.ITCS.2024.65},
  annote =	{Keywords: Approximation algorithm, parameterized algorithm, clustering}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2023.41

Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions

Authors: Ragesh Jaiswal and Amit Kumar

Published in: LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Abstract

Constrained clustering problems generalize classical clustering formulations, e.g., k-median, k-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained k-median or k-means problem to the corresponding outlier-free version with only (1+ε)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to f(k, m, ε) instances of the outlier-free version, where f(k, m, ε) = ((k+m)/ε)^O(m). As specific applications, we get the following results: - First FPT (in the parameters k and m) (1+ε)-approximation algorithm for the outlier version of capacitated k-median and k-means in Euclidean spaces with hard capacities. - First FPT (in the parameters k and m) (3+ε) and (9+ε) approximation algorithms for the outlier version of capacitated k-median and k-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters k and m) (2-δ)-approximation algorithm for the outlier version of the k-median problem under the Ulam metric. Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

Cite as

Ragesh Jaiswal and Amit Kumar. Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{jaiswal_et_al:LIPIcs.ISAAC.2023.41,
  author =	{Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions}},
  booktitle =	{34th International Symposium on Algorithms and Computation (ISAAC 2023)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-289-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{283},
  editor =	{Iwata, Satoru and Kakimura, Naonori},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.41},
  URN =		{urn:nbn:de:0030-drops-193433},
  doi =		{10.4230/LIPIcs.ISAAC.2023.41},
  annote =	{Keywords: clustering, constrained, outlier}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.4

Hardness of Approximation for Euclidean k-Median

Authors: Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Abstract

The Euclidean k-median problem is defined in the following manner: given a set 𝒳 of n points in d-dimensional Euclidean space ℝ^d, and an integer k, find a set C ⊂ ℝ^d of k points (called centers) such that the cost function Φ(C,𝒳) ≡ ∑_{x ∈ 𝒳} min_{c ∈ C} ‖x-c‖₂ is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output β k centers (for constant β > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any β < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of β < 1.28, again assuming UGC.

Cite as

Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of Approximation for Euclidean k-Median. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.APPROX/RANDOM.2021.4,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh},
  title =	{{Hardness of Approximation for Euclidean k-Median}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  URN =		{urn:nbn:de:0030-drops-146979},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  annote =	{Keywords: Hardness of approximation, bicriteria approximation, approximation algorithms, k-median, k-means}
}

Document

DOI: 10.4230/LIPIcs.IPEC.2020.14

FPT Approximation for Constrained Metric k-Median/Means

Authors: Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 180, 15th International Symposium on Parameterized and Exact Computation (IPEC 2020)

Abstract

The Metric k-median problem over a metric space (𝒳, d) is defined as follows: given a set L ⊆ 𝒳 of facility locations and a set C ⊆ 𝒳 of clients, open a set F ⊆ L of k facilities such that the total service cost, defined as Φ(F, C) := ∑_{x ∈ C} min_{f ∈ F} d(x, f), is minimised. The metric k-means problem is defined similarly using squared distances (i.e., d²(., .) instead of d(., .)). In many applications there are additional constraints that any solution needs to satisfy. For example, to balance the load among the facilities in resource allocation problems, a capacity u is imposed on every facility. That is, no more than u clients can be assigned to any facility. This problem is known as the capacitated k-means/k-median problem. Likewise, various other applications have different constraints, which give rise to different constrained versions of the problem such as r-gather, fault-tolerant, outlier k-means/k-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. Moreover, the unconstrained problem itself is known [Marek Adamczyk et al., 2019] to be W[2]-hard when parameterized by k. We give FPT algorithms with constant approximation guarantee for a range of constrained k-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu [Ding and Xu, 2015] that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a (3+ε)-approximation and (9+ε)-approximation for the constrained versions of the k-median and k-means problem respectively in FPT time. In many practical settings of the k-median/means problem, one is allowed to open a facility at any client location, i.e., C ⊆ L. For this special case, our algorithm gives a (2+ε)-approximation and (4+ε)-approximation for the constrained versions of k-median and k-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm. In particular, here are some of the main highlights of this work: 1) For the uniform capacitated k-median/means problems our results matches previously known results of Addad et al. [Vincent Cohen-Addad and Jason Li, 2019]. 2) For the r-gather k-median/means problem (clustering with lower bound on the size of clusters), our FPT approximation bounds are better than what was previously known. 3) Our approximation bounds for the fault-tolerant, outlier, and uncertain versions is better than all previously known results, albeit in FPT time. 4) For certain constrained settings such as chromatic, l-diversity, and semi-supervised k-median/means, we obtain the first constant factor approximation algorithms to the best of our knowledge. 5) Since our algorithms are based on a simple sampling based approach, we also obtain constant-pass log-space streaming algorithms for most of the above-mentioned problems.

Cite as

Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. FPT Approximation for Constrained Metric k-Median/Means. In 15th International Symposium on Parameterized and Exact Computation (IPEC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 180, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{goyal_et_al:LIPIcs.IPEC.2020.14,
  author =	{Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{FPT Approximation for Constrained Metric k-Median/Means}},
  booktitle =	{15th International Symposium on Parameterized and Exact Computation (IPEC 2020)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-172-6},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{180},
  editor =	{Cao, Yixin and Pilipczuk, Marcin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2020.14},
  URN =		{urn:nbn:de:0030-drops-133170},
  doi =		{10.4230/LIPIcs.IPEC.2020.14},
  annote =	{Keywords: k-means, k-median, approximation algorithms, parameterised algorithms}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2020.13

On Sampling Based Algorithms for k-Means

Authors: Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)

Abstract

We generalise the results of Bhattacharya et al. [Bhattacharya et al., 2018] for the list-k-means problem defined as - for a (unknown) partition X₁, ..., X_k of the dataset X ⊆ ℝ^d, find a list of k-center-sets (each element in the list is a set of k centers) such that at least one of k-center-sets {c₁, ..., c_k} in the list gives an (1+ε)-approximation with respect to the cost function min_{permutation π} [∑_{i = 1}^{k} ∑_{x ∈ X_i} ||x - c_{π(i)}||²]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to {PTAS} for various versions of the latter. The algorithm for the list-k-means problem by Bhattacharya et al. is a D²-sampling based algorithm that runs in k iterations. Making use of a constant factor solution for the (classical or unconstrained) k-means problem, we generalise the algorithm of Bhattacharya et al. in two ways - (i) for any fixed set X_{j₁}, ..., X_{j_t} of t ≤ k clusters, the algorithm produces a list of (k/(ε))^{O(t/(ε))} t-center sets such that (w.h.p.) at least one of them is good for X_{j₁}, ..., X_{j_t}, and (ii) the algorithm runs in a single iteration. Following are the consequences of our generalisations: 1) Faster PTAS under stability and a parameterised reduction: Property (i) of our generalisation is useful in scenarios where finding good centers becomes easier once good centers for a few "bad" clusters have been chosen. One such case is clustering under stability of Awasthi et al. [Awasthi et al., 2010] where the number of such bad clusters is a constant. Using property (i), we significantly improve the running time of their algorithm from O(dn³) (k log{n})^{poly(1/(β), 1/(ε))} to O (dn³ (k/(ε)) ^{O(1/βε²)}). Another application is a parameterised reduction from the outlier version of k-means to the classical one where the bad clusters are the outliers. 2) Streaming algorithms: The sampling algorithm running in a single iteration (i.e., property (ii)) allows us to design a constant-pass, logspace streaming algorithm for the list-k-means problem. This can be converted to a constant-pass, logspace streaming PTAS for various constrained versions of the k-means problem. In particular, this gives a 3-pass, polylog-space streaming PTAS for the constrained binary k-means problem which in turn gives a 4-pass, polylog-space streaming PTAS for the generalised binary 𝓁₀-rank-r approximation problem. This is the first constant pass, polylog-space streaming algorithm for either of the two problems. Coreset based techniques, which is another approach for designing streaming algorithms in general, is not known to work for the constrained binary k-means problem to the best of our knowledge.

Cite as

Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.FSTTCS.2020.13,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{On Sampling Based Algorithms for k-Means}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.13},
  URN =		{urn:nbn:de:0030-drops-132549},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.13},
  annote =	{Keywords: k-means, low rank approximation}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2018.40

Approximate Clustering with Same-Cluster Queries

Authors: Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

Abstract

Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?", where the answers to these queries are assumed to be consistent with a unique optimal solution. There are many clustering contexts where such same cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the k-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed to make O(k^2 log{k} + k log{n}) same-cluster queries. This is interesting since the k-means problem, even with the margin assumption, is NP-hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset. Again, this is interesting since the k-means problem is NP-hard to approximate within a factor (1+c) for a fixed constant 0 < c < 1. The number of same-cluster queries used by the algorithm is poly(k/eps) which is independent of the size n of the dataset. Our algorithm is based on the D^2-sampling technique, also known as the k-means++ seeding algorithm. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make Omega (k/poly log k) same-cluster queries. Our algorithm can be extended for the case where the query answers are wrong with some bounded probability. Another result we show for the k-means++ seeding is that a small modification of the k-means++ seeding within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known O(log k)-approximation algorithm.

Cite as

Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 40:1-40:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{ailon_et_al:LIPIcs.ITCS.2018.40,
  author =	{Ailon, Nir and Bhattacharya, Anup and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Approximate Clustering with Same-Cluster Queries}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{40:1--40:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.40},
  URN =		{urn:nbn:de:0030-drops-83358},
  doi =		{10.4230/LIPIcs.ITCS.2018.40},
  annote =	{Keywords: k-means, semi-supervised learning, query bounds}
}

Document

DOI: 10.4230/LIPIcs.STACS.2016.16

Faster Algorithms for the Constrained k-Means Problem

Authors: Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Abstract

The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r-gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O_1, ..., O_k are an arbitrary partition of the dataset and the goal is to output k-centers c_1, ..., c_k such that the objective function sum_{i=1}^{k} sum_{x in O_{i}} ||x - c_{i}||^2 is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter epsilon > 0, let l denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1+epsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on l by giving a randomized algorithm that outputs a list of 2^{~O(k/epsilon)} k-centers. We also give a closely matching lower bound of 2^{~Omega(k/sqrt{epsilon})}. Moreover, our algorithm runs in time O(n * d * 2^{~O(k/epsilon)}). This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time O(n * d * (log{n})^{k} * 2^{poly(k/epsilon)}) and output a list of size O((log{n})^k * 2^{poly(k/epsilon)}). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.

Cite as

Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Faster Algorithms for the Constrained k-Means Problem. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 16:1-16:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.STACS.2016.16,
  author =	{Bhattacharya, Anup and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Faster Algorithms for the Constrained k-Means Problem}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{16:1--16:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.16},
  URN =		{urn:nbn:de:0030-drops-57179},
  doi =		{10.4230/LIPIcs.STACS.2016.16},
  annote =	{Keywords: k-means, k-median, approximation algorithm, sampling}
}

10 Search Results for "Jaiswal, Ragesh"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as