14 Search Results for "Jaiswal, Ragesh"


Document
Polynomial-Time Constant-Approximation for Fair Sum-Of-Radii Clustering

Authors: Sina Bagheri Nezhad, Sayan Bandyapadhyay, and Tianzhi Chen

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)


Abstract
In a seminal work, Chierichetti et al. [Chierichetti et al., 2017] introduced the (t,k)-fair clustering problem: Given a set of red points and a set of blue points in a metric space, a clustering is called fair if the number of red points in each cluster is at most t times and at least 1/t times the number of blue points in that cluster. The goal is to compute a fair clustering with at most k clusters that optimizes certain objective function. Considering this problem, they designed a polynomial-time O(1)- and O(t)-approximation for the k-center and the k-median objective, respectively. Recently, Carta et al. [Carta et al., 2024] studied this problem with the sum-of-radii objective and obtained a (6+ε)-approximation with running time O((k log_{1+ε}(k/ε))^k n^O(1)), i.e., fixed-parameter tractable in k. Here n is the input size. In this work, we design the first polynomial-time O(1)-approximation for (t,k)-fair clustering with the sum-of-radii objective, improving the result of Carta et al. Our result places sum-of-radii in the same group of objectives as k-center, that admit polynomial-time O(1)-approximations. This result also implies a polynomial-time O(1)-approximation for the Euclidean version of the problem, for which an f(k)⋅n^O(1)-time (1+ε)-approximation was known due to Drexler et al. [Drexler et al., 2023]. Here f is an exponential function of k. We are also able to extend our result to any arbitrary 𝓁 ≥ 2 number of colors when t = 1. This matches known results for the k-center and k-median objectives in this case. The significant disparity of sum-of-radii compared to k-center and k-median presents several complex challenges, all of which we successfully overcome in our work. Our main contribution is a novel cluster-merging-based analysis technique for sum-of-radii that helps us achieve the constant-approximation bounds.

Cite as

Sina Bagheri Nezhad, Sayan Bandyapadhyay, and Tianzhi Chen. Polynomial-Time Constant-Approximation for Fair Sum-Of-Radii Clustering. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 62:1-62:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bagherinezhad_et_al:LIPIcs.ESA.2025.62,
  author =	{Bagheri Nezhad, Sina and Bandyapadhyay, Sayan and Chen, Tianzhi},
  title =	{{Polynomial-Time Constant-Approximation for Fair Sum-Of-Radii Clustering}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{62:1--62:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.62},
  URN =		{urn:nbn:de:0030-drops-245309},
  doi =		{10.4230/LIPIcs.ESA.2025.62},
  annote =	{Keywords: fair clustering, sum-of-radii clustering, approximation algorithms}
}
Document
APPROX
Improved FPT Approximation for Sum of Radii Clustering with Mergeable Constraints

Authors: Sayan Bandyapadhyay and Tianzhi Chen

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)


Abstract
In this work, we study k-min-sum-of-radii (k-MSR) clustering under mergeable constraints. k-MSR seeks to group data points using a set of up to k balls, such that the sum of the radii of the balls is minimized. A clustering constraint is called mergeable if merging two clusters satisfying the constraint, results in a cluster that also satisfies the constraint. Many popularly studied constraints are mergeable, including fairness constraints and lower bound constraints. In our work, we design a (4+ε)-approximation for k-MSR under any given mergeable constraint with runtime 2^{O(k/(ε)⋅log²k/ε)} n⁴, i.e., fixed-parameter tractable in k for constant ε. Our result directly improves upon the FPT (6+ε)-approximation by Carta et al. [Carta et al., 2024]. We also provide a hardness result that excludes the exact solvability of k-MSR under any given mergeable constraint in time f(k)n^o(k), assuming ETH is true.

Cite as

Sayan Bandyapadhyay and Tianzhi Chen. Improved FPT Approximation for Sum of Radii Clustering with Mergeable Constraints. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 23:1-23:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.APPROX/RANDOM.2025.23,
  author =	{Bandyapadhyay, Sayan and Chen, Tianzhi},
  title =	{{Improved FPT Approximation for Sum of Radii Clustering with Mergeable Constraints}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{23:1--23:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.23},
  URN =		{urn:nbn:de:0030-drops-243894},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.23},
  annote =	{Keywords: sum-of-radii clustering, mergeable constraints, approximation algorithm}
}
Document
Towards Free Lunch Derandomization from Necessary Assumptions (And OWFs)

Authors: Marshall Ball, Lijie Chen, and Roei Tell

Published in: LIPIcs, Volume 339, 40th Computational Complexity Conference (CCC 2025)


Abstract
The question of optimal derandomization, introduced by Doron et. al (JACM 2022), garnered significant recent attention. Works in recent years showed conditional superfast derandomization algorithms, as well as conditional impossibility results, and barriers for obtaining superfast derandomization using certain black-box techniques. Of particular interest is the extreme high-end, which focuses on "free lunch" derandomization, as suggested by Chen and Tell (FOCS 2021). This is derandomization that incurs essentially no time overhead, and errs only on inputs that are infeasible to find. Constructing such algorithms is challenging, and so far there have not been any results following the one in their initial work. In their result, their algorithm is essentially the classical Nisan-Wigderson generator, and they relied on an ad-hoc assumption asserting the existence of a function that is non-batch-computable over all polynomial-time samplable distributions. In this work we deduce free lunch derandomization from a variety of natural hardness assumptions. In particular, we do not resort to non-batch-computability, and the common denominator for all of our assumptions is hardness over all polynomial-time samplable distributions, which is necessary for the conclusion. The main technical components in our proofs are constructions of new and superfast targeted generators, which completely eliminate the time overheads that are inherent to all previously known constructions. In particular, we present an alternative construction for the targeted generator by Chen and Tell (FOCS 2021), which is faster than the original construction, and also more natural and technically intuitive. These contributions significantly strengthen the evidence for the possibility of free lunch derandomization, distill the required assumptions for such a result, and provide the first set of dedicated technical tools that are useful for studying the question.

Cite as

Marshall Ball, Lijie Chen, and Roei Tell. Towards Free Lunch Derandomization from Necessary Assumptions (And OWFs). In 40th Computational Complexity Conference (CCC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 339, pp. 31:1-31:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ball_et_al:LIPIcs.CCC.2025.31,
  author =	{Ball, Marshall and Chen, Lijie and Tell, Roei},
  title =	{{Towards Free Lunch Derandomization from Necessary Assumptions (And OWFs)}},
  booktitle =	{40th Computational Complexity Conference (CCC 2025)},
  pages =	{31:1--31:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-379-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{339},
  editor =	{Srinivasan, Srikanth},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2025.31},
  URN =		{urn:nbn:de:0030-drops-237259},
  doi =		{10.4230/LIPIcs.CCC.2025.31},
  annote =	{Keywords: Pseudorandomness, Derandomization}
}
Document
Track A: Algorithms, Complexity and Games
Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds

Authors: Michael Kapralov, Akash Kumar, Silvio Lattanzi, Aida Mousavifar, and Weronika Wrzos-Kaminska

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC'96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a (k, ε)-clusterable graph G is a graph whose vertex set admits a partition into k induced expanders, each with outer conductance bounded by ε. A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to (k, ε)-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate ≈ ε, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the "cluster graph" forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of (k, ε)-clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a (k, ε)-clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an O(√{log k}) approximation to Dasgupta cost of G in ≈ n^{1/2+O(ε)} time using ≈ n^{1/3} seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.

Cite as

Michael Kapralov, Akash Kumar, Silvio Lattanzi, Aida Mousavifar, and Weronika Wrzos-Kaminska. Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 103:1-103:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{kapralov_et_al:LIPIcs.ICALP.2025.103,
  author =	{Kapralov, Michael and Kumar, Akash and Lattanzi, Silvio and Mousavifar, Aida and Wrzos-Kaminska, Weronika},
  title =	{{Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{103:1--103:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.103},
  URN =		{urn:nbn:de:0030-drops-234804},
  doi =		{10.4230/LIPIcs.ICALP.2025.103},
  annote =	{Keywords: Sublinear algorithms, Hierarchical Clustering, Dasgupta’s Cost}
}
Document
Track A: Algorithms, Complexity and Games
A Pseudorandom Generator for Functions of Low-Degree Polynomial Threshold Functions

Authors: Penghui Yao and Mingnan Zhao

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
Developing explicit pseudorandom generators (PRGs) for prominent categories of Boolean functions is a key focus in computational complexity theory. In this paper, we investigate the PRGs against the functions of degree-d polynomial threshold functions (PTFs) over Gaussian space. Our main result is an explicit construction of PRG with seed length poly(k, d, 1/ε)⋅log n that can fool any function of k degree-d PTFs with probability at least 1 - ε. More specifically, we show that the summation of L independent R-moment-matching Gaussian vectors ε-fools functions of k degree-d PTFs, where L = poly(k, d, 1/ε) and R = O(log kd/ε). The PRG is then obtained by applying an appropriate discretization to Gaussian vectors with bounded independence.

Cite as

Penghui Yao and Mingnan Zhao. A Pseudorandom Generator for Functions of Low-Degree Polynomial Threshold Functions. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 134:1-134:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{yao_et_al:LIPIcs.ICALP.2025.134,
  author =	{Yao, Penghui and Zhao, Mingnan},
  title =	{{A Pseudorandom Generator for Functions of Low-Degree Polynomial Threshold Functions}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{134:1--134:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.134},
  URN =		{urn:nbn:de:0030-drops-235112},
  doi =		{10.4230/LIPIcs.ICALP.2025.134},
  annote =	{Keywords: Pseudorandom generators, polynomial threshold functions}
}
Document
Track A: Algorithms, Complexity and Games
Deterministic k-Median Clustering in Near-Optimal Time

Authors: Martín Costa and Ermiya Farokhnejad

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
The metric k-median problem is a textbook clustering problem. As input, we are given a metric space V of size n and an integer k, and our task is to find a subset S ⊆ V of at most k "centers" that minimizes the total distance from each point in V to its nearest center in S. Mettu and Plaxton [UAI'02] gave a randomized algorithm for k-median that computes a O(1)-approximation in Õ(nk) time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of Ω(nk). Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic k-median, Guha et al. [FOCS'00] gave an algorithm that computes a poly(log (n/k))-approximation in Õ(nk) time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic k-median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic k-median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a O(log(n/k))-approximation in Õ(nk) time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of Ω(log n/(log k + log log n)), establishing a gap between the randomized and deterministic settings for k-median.

Cite as

Martín Costa and Ermiya Farokhnejad. Deterministic k-Median Clustering in Near-Optimal Time. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 62:1-62:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{costa_et_al:LIPIcs.ICALP.2025.62,
  author =	{Costa, Mart{\'\i}n and Farokhnejad, Ermiya},
  title =	{{Deterministic k-Median Clustering in Near-Optimal Time}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{62:1--62:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.62},
  URN =		{urn:nbn:de:0030-drops-234395},
  doi =		{10.4230/LIPIcs.ICALP.2025.62},
  annote =	{Keywords: k-clustering, k-median, deterministic algorithms, approximation algorithms}
}
Document
Track A: Algorithms, Complexity and Games
Robust-Sorting and Applications to Ulam-Median

Authors: Ragesh Jaiswal, Amit Kumar, and Jatin Yadav

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
Sorting is one of the most basic primitives in many algorithms and data analysis tasks. Comparison-based sorting algorithms, like quick-sort and merge-sort, are known to be optimal when the outcome of each comparison is error-free. However, many real-world sorting applications operate in scenarios where the outcome of each comparison can be noisy. In this work, we explore settings where a bounded number of comparisons are potentially corrupted by erroneous agents, resulting in arbitrary, adversarial outcomes. We model the sorting problem as a query-limited tournament graph where edges involving erroneous nodes may yield arbitrary results. Our primary contribution is a randomized algorithm inspired by quick-sort that, in expectation, produces an ordering close to the true total order while only querying Õ(n) edges. We achieve a distance from the target order π within (3 + ε)|B|, where B is the set of erroneous nodes, balancing the competing objectives of minimizing both query complexity and misalignment with π. Our algorithm needs to carefully balance two aspects - identify a pivot that partitions the vertex set evenly and ensure that this partition is "truthful" and yet query as few "triangles" in the graph G as possible. Since the nodes in B can potentially hide in an intricate manner, our algorithm requires several technical steps that ensure that progress is made in each recursive step. Additionally, we demonstrate significant implications for the Ulam-k-Median problem. This is a classical clustering problem where the metric is defined on the set of permutations on a set of d elements. Chakraborty, Das, and Krauthgamer gave a (2-ε) FPT approximation algorithm for this problem, where the running time is super-linear in both n and d. We give the first (2-ε) FPT linear time approximation algorithm for this problem. Our main technical result gives a strengthening of the results in Chakraborty et al. by showing that a good 1-median solution can be obtained from a constant-size random sample of the input. We use our robust sorting framework to find a good solution from such a random sample. We feel that the notion of robust sorting should have applications in several such settings.

Cite as

Ragesh Jaiswal, Amit Kumar, and Jatin Yadav. Robust-Sorting and Applications to Ulam-Median. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 100:1-100:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{jaiswal_et_al:LIPIcs.ICALP.2025.100,
  author =	{Jaiswal, Ragesh and Kumar, Amit and Yadav, Jatin},
  title =	{{Robust-Sorting and Applications to Ulam-Median}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{100:1--100:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.100},
  URN =		{urn:nbn:de:0030-drops-234774},
  doi =		{10.4230/LIPIcs.ICALP.2025.100},
  annote =	{Keywords: Sorting, clustering, query complexity}
}
Document
FPT Approximation for Capacitated Sum of Radii

Authors: Ragesh Jaiswal, Amit Kumar, and Jatin Yadav

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)


Abstract
We consider the capacitated clustering problem in general metric spaces where the goal is to identify k clusters and minimize the sum of the radii of the clusters (we call this the Capacitated k-sumRadii problem). We are interested in fixed-parameter tractable (FPT) approximation algorithms where the running time is of the form f(k) ⋅ poly(n), where f(k) can be an exponential function of k and n is the number of points in the input. In the uniform capacity case, Bandyapadhyay et al. recently gave a 4-approximation algorithm for this problem. Our first result improves this to an FPT 3-approximation and extends to a constant factor approximation for any L_p norm of the cluster radii. In the general capacities version, Bandyapadhyay et al. gave an FPT 15-approximation algorithm. We extend their framework to give an FPT (4 + √13)-approximation algorithm for this problem. Our framework relies on a novel idea of identifying approximations to optimal clusters by carefully pruning points from an initial candidate set of points. This is in contrast to prior results that rely on guessing suitable points and building balls of appropriate radii around them. On the hardness front, we show that assuming the Exponential Time Hypothesis, there is a constant c > 1 such that any c-approximation algorithm for the non-uniform capacity version of this problem requires running time 2^Ω(k/polylog(k)).

Cite as

Ragesh Jaiswal, Amit Kumar, and Jatin Yadav. FPT Approximation for Capacitated Sum of Radii. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 65:1-65:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{jaiswal_et_al:LIPIcs.ITCS.2024.65,
  author =	{Jaiswal, Ragesh and Kumar, Amit and Yadav, Jatin},
  title =	{{FPT Approximation for Capacitated Sum of Radii}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{65:1--65:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.65},
  URN =		{urn:nbn:de:0030-drops-195937},
  doi =		{10.4230/LIPIcs.ITCS.2024.65},
  annote =	{Keywords: Approximation algorithm, parameterized algorithm, clustering}
}
Document
Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions

Authors: Ragesh Jaiswal and Amit Kumar

Published in: LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)


Abstract
Constrained clustering problems generalize classical clustering formulations, e.g., k-median, k-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained k-median or k-means problem to the corresponding outlier-free version with only (1+ε)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to f(k, m, ε) instances of the outlier-free version, where f(k, m, ε) = ((k+m)/ε)^O(m). As specific applications, we get the following results: - First FPT (in the parameters k and m) (1+ε)-approximation algorithm for the outlier version of capacitated k-median and k-means in Euclidean spaces with hard capacities. - First FPT (in the parameters k and m) (3+ε) and (9+ε) approximation algorithms for the outlier version of capacitated k-median and k-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters k and m) (2-δ)-approximation algorithm for the outlier version of the k-median problem under the Ulam metric. Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

Cite as

Ragesh Jaiswal and Amit Kumar. Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{jaiswal_et_al:LIPIcs.ISAAC.2023.41,
  author =	{Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions}},
  booktitle =	{34th International Symposium on Algorithms and Computation (ISAAC 2023)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-289-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{283},
  editor =	{Iwata, Satoru and Kakimura, Naonori},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.41},
  URN =		{urn:nbn:de:0030-drops-193433},
  doi =		{10.4230/LIPIcs.ISAAC.2023.41},
  annote =	{Keywords: clustering, constrained, outlier}
}
Document
APPROX
Hardness of Approximation for Euclidean k-Median

Authors: Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)


Abstract
The Euclidean k-median problem is defined in the following manner: given a set 𝒳 of n points in d-dimensional Euclidean space ℝ^d, and an integer k, find a set C ⊂ ℝ^d of k points (called centers) such that the cost function Φ(C,𝒳) ≡ ∑_{x ∈ 𝒳} min_{c ∈ C} ‖x-c‖₂ is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output β k centers (for constant β > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any β < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of β < 1.28, again assuming UGC.

Cite as

Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of Approximation for Euclidean k-Median. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.APPROX/RANDOM.2021.4,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh},
  title =	{{Hardness of Approximation for Euclidean k-Median}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  URN =		{urn:nbn:de:0030-drops-146979},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  annote =	{Keywords: Hardness of approximation, bicriteria approximation, approximation algorithms, k-median, k-means}
}
Document
FPT Approximation for Constrained Metric k-Median/Means

Authors: Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 180, 15th International Symposium on Parameterized and Exact Computation (IPEC 2020)


Abstract
The Metric k-median problem over a metric space (𝒳, d) is defined as follows: given a set L ⊆ 𝒳 of facility locations and a set C ⊆ 𝒳 of clients, open a set F ⊆ L of k facilities such that the total service cost, defined as Φ(F, C) := ∑_{x ∈ C} min_{f ∈ F} d(x, f), is minimised. The metric k-means problem is defined similarly using squared distances (i.e., d²(., .) instead of d(., .)). In many applications there are additional constraints that any solution needs to satisfy. For example, to balance the load among the facilities in resource allocation problems, a capacity u is imposed on every facility. That is, no more than u clients can be assigned to any facility. This problem is known as the capacitated k-means/k-median problem. Likewise, various other applications have different constraints, which give rise to different constrained versions of the problem such as r-gather, fault-tolerant, outlier k-means/k-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. Moreover, the unconstrained problem itself is known [Marek Adamczyk et al., 2019] to be W[2]-hard when parameterized by k. We give FPT algorithms with constant approximation guarantee for a range of constrained k-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu [Ding and Xu, 2015] that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a (3+ε)-approximation and (9+ε)-approximation for the constrained versions of the k-median and k-means problem respectively in FPT time. In many practical settings of the k-median/means problem, one is allowed to open a facility at any client location, i.e., C ⊆ L. For this special case, our algorithm gives a (2+ε)-approximation and (4+ε)-approximation for the constrained versions of k-median and k-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm. In particular, here are some of the main highlights of this work: 1) For the uniform capacitated k-median/means problems our results matches previously known results of Addad et al. [Vincent Cohen-Addad and Jason Li, 2019]. 2) For the r-gather k-median/means problem (clustering with lower bound on the size of clusters), our FPT approximation bounds are better than what was previously known. 3) Our approximation bounds for the fault-tolerant, outlier, and uncertain versions is better than all previously known results, albeit in FPT time. 4) For certain constrained settings such as chromatic, l-diversity, and semi-supervised k-median/means, we obtain the first constant factor approximation algorithms to the best of our knowledge. 5) Since our algorithms are based on a simple sampling based approach, we also obtain constant-pass log-space streaming algorithms for most of the above-mentioned problems.

Cite as

Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. FPT Approximation for Constrained Metric k-Median/Means. In 15th International Symposium on Parameterized and Exact Computation (IPEC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 180, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{goyal_et_al:LIPIcs.IPEC.2020.14,
  author =	{Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{FPT Approximation for Constrained Metric k-Median/Means}},
  booktitle =	{15th International Symposium on Parameterized and Exact Computation (IPEC 2020)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-172-6},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{180},
  editor =	{Cao, Yixin and Pilipczuk, Marcin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2020.14},
  URN =		{urn:nbn:de:0030-drops-133170},
  doi =		{10.4230/LIPIcs.IPEC.2020.14},
  annote =	{Keywords: k-means, k-median, approximation algorithms, parameterised algorithms}
}
Document
On Sampling Based Algorithms for k-Means

Authors: Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)


Abstract
We generalise the results of Bhattacharya et al. [Bhattacharya et al., 2018] for the list-k-means problem defined as - for a (unknown) partition X₁, ..., X_k of the dataset X ⊆ ℝ^d, find a list of k-center-sets (each element in the list is a set of k centers) such that at least one of k-center-sets {c₁, ..., c_k} in the list gives an (1+ε)-approximation with respect to the cost function min_{permutation π} [∑_{i = 1}^{k} ∑_{x ∈ X_i} ||x - c_{π(i)}||²]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to {PTAS} for various versions of the latter. The algorithm for the list-k-means problem by Bhattacharya et al. is a D²-sampling based algorithm that runs in k iterations. Making use of a constant factor solution for the (classical or unconstrained) k-means problem, we generalise the algorithm of Bhattacharya et al. in two ways - (i) for any fixed set X_{j₁}, ..., X_{j_t} of t ≤ k clusters, the algorithm produces a list of (k/(ε))^{O(t/(ε))} t-center sets such that (w.h.p.) at least one of them is good for X_{j₁}, ..., X_{j_t}, and (ii) the algorithm runs in a single iteration. Following are the consequences of our generalisations: 1) Faster PTAS under stability and a parameterised reduction: Property (i) of our generalisation is useful in scenarios where finding good centers becomes easier once good centers for a few "bad" clusters have been chosen. One such case is clustering under stability of Awasthi et al. [Awasthi et al., 2010] where the number of such bad clusters is a constant. Using property (i), we significantly improve the running time of their algorithm from O(dn³) (k log{n})^{poly(1/(β), 1/(ε))} to O (dn³ (k/(ε)) ^{O(1/βε²)}). Another application is a parameterised reduction from the outlier version of k-means to the classical one where the bad clusters are the outliers. 2) Streaming algorithms: The sampling algorithm running in a single iteration (i.e., property (ii)) allows us to design a constant-pass, logspace streaming algorithm for the list-k-means problem. This can be converted to a constant-pass, logspace streaming PTAS for various constrained versions of the k-means problem. In particular, this gives a 3-pass, polylog-space streaming PTAS for the constrained binary k-means problem which in turn gives a 4-pass, polylog-space streaming PTAS for the generalised binary 𝓁₀-rank-r approximation problem. This is the first constant pass, polylog-space streaming algorithm for either of the two problems. Coreset based techniques, which is another approach for designing streaming algorithms in general, is not known to work for the constrained binary k-means problem to the best of our knowledge.

Cite as

Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.FSTTCS.2020.13,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{On Sampling Based Algorithms for k-Means}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.13},
  URN =		{urn:nbn:de:0030-drops-132549},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.13},
  annote =	{Keywords: k-means, low rank approximation}
}
Document
Approximate Clustering with Same-Cluster Queries

Authors: Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)


Abstract
Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?", where the answers to these queries are assumed to be consistent with a unique optimal solution. There are many clustering contexts where such same cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the k-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed to make O(k^2 log{k} + k log{n}) same-cluster queries. This is interesting since the k-means problem, even with the margin assumption, is NP-hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset. Again, this is interesting since the k-means problem is NP-hard to approximate within a factor (1+c) for a fixed constant 0 < c < 1. The number of same-cluster queries used by the algorithm is poly(k/eps) which is independent of the size n of the dataset. Our algorithm is based on the D^2-sampling technique, also known as the k-means++ seeding algorithm. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make Omega (k/poly log k) same-cluster queries. Our algorithm can be extended for the case where the query answers are wrong with some bounded probability. Another result we show for the k-means++ seeding is that a small modification of the k-means++ seeding within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known O(log k)-approximation algorithm.

Cite as

Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 40:1-40:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ailon_et_al:LIPIcs.ITCS.2018.40,
  author =	{Ailon, Nir and Bhattacharya, Anup and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Approximate Clustering with Same-Cluster Queries}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{40:1--40:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.40},
  URN =		{urn:nbn:de:0030-drops-83358},
  doi =		{10.4230/LIPIcs.ITCS.2018.40},
  annote =	{Keywords: k-means, semi-supervised learning, query bounds}
}
Document
Faster Algorithms for the Constrained k-Means Problem

Authors: Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)


Abstract
The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r-gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O_1, ..., O_k are an arbitrary partition of the dataset and the goal is to output k-centers c_1, ..., c_k such that the objective function sum_{i=1}^{k} sum_{x in O_{i}} ||x - c_{i}||^2 is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter epsilon > 0, let l denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1+epsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on l by giving a randomized algorithm that outputs a list of 2^{~O(k/epsilon)} k-centers. We also give a closely matching lower bound of 2^{~Omega(k/sqrt{epsilon})}. Moreover, our algorithm runs in time O(n * d * 2^{~O(k/epsilon)}). This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time O(n * d * (log{n})^{k} * 2^{poly(k/epsilon)}) and output a list of size O((log{n})^k * 2^{poly(k/epsilon)}). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.

Cite as

Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Faster Algorithms for the Constrained k-Means Problem. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 16:1-16:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.STACS.2016.16,
  author =	{Bhattacharya, Anup and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Faster Algorithms for the Constrained k-Means Problem}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{16:1--16:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.16},
  URN =		{urn:nbn:de:0030-drops-57179},
  doi =		{10.4230/LIPIcs.STACS.2016.16},
  annote =	{Keywords: k-means, k-median, approximation algorithm, sampling}
}
  • Refine by Type
  • 14 Document/PDF
  • 7 Document/HTML

  • Refine by Publication Year
  • 7 2025
  • 1 2024
  • 1 2023
  • 1 2021
  • 2 2020
  • Show More...

  • Refine by Author
  • 8 Jaiswal, Ragesh
  • 7 Kumar, Amit
  • 4 Bhattacharya, Anup
  • 3 Goyal, Dishant
  • 2 Bandyapadhyay, Sayan
  • Show More...

  • Refine by Series/Journal
  • 14 LIPIcs

  • Refine by Classification
  • 6 Theory of computation → Facility location and clustering
  • 3 Theory of computation → Approximation algorithms analysis
  • 3 Theory of computation → Streaming, sublinear and near linear time algorithms
  • 2 Theory of computation → Pseudorandomness and derandomization
  • 1 Theory of computation → Computational complexity and cryptography
  • Show More...

  • Refine by Keyword
  • 5 k-means
  • 4 approximation algorithms
  • 4 k-median
  • 3 clustering
  • 2 approximation algorithm
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail