Document

**Published in:** LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

Geometric matching is an important topic in computational geometry and has been extensively studied over decades. In this paper, we study a geometric-matching problem, known as geometric many-to-many matching. In this problem, the input is a set S of n colored points in ℝ^d, which implicitly defines a graph G = (S,E(S)) where E(S) = {(p,q): p,q ∈ S have different colors}, and the goal is to compute a minimum-cost subset E^* ⊆ E(S) of edges that cover all points in S. Here the cost of E^* is the sum of the costs of all edges in E^*, where the cost of a single edge e is the Euclidean distance (or more generally, the L_p-distance) between the two endpoints of e. Our main result is a (1+ε)-approximation algorithm with an optimal running time O_ε(n log n) for geometric many-to-many matching in any fixed dimension, which works under any L_p-norm. This is the first near-linear approximation scheme for the problem in any d ≥ 2. Prior to this work, only the bipartite case of geometric many-to-many matching was considered in ℝ¹ and ℝ², and the best known approximation scheme in ℝ² takes O_ε(n^{1.5} ⋅ poly(log n)) time.

Sayan Bandyapadhyay and Jie Xue. An O(n log n)-Time Approximation Scheme for Geometric Many-To-Many Matching. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2024.12, author = {Bandyapadhyay, Sayan and Xue, Jie}, title = {{An O(n log n)-Time Approximation Scheme for Geometric Many-To-Many Matching}}, booktitle = {40th International Symposium on Computational Geometry (SoCG 2024)}, pages = {12:1--12:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-316-4}, ISSN = {1868-8969}, year = {2024}, volume = {293}, editor = {Mulzer, Wolfgang and Phillips, Jeff M.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.12}, URN = {urn:nbn:de:0030-drops-199577}, doi = {10.4230/LIPIcs.SoCG.2024.12}, annote = {Keywords: many-to-many matching, geometric optimization, approximation algorithms} }

Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

In this work, we address the following question. Suppose we are given a set D of positive-weighted disks and a set T of n points in the plane, such that each point of T is contained in at least two disks of D. Then is there always a subset S of D such that the union of the disks in S contains all the points of T and the total weight of the disks of D that are not in S is at least a constant fraction of the total weight of the disks in D?
In our work, we prove the Extraction Theorem that answers this question in the affirmative. Our constructive proof heavily exploits the geometry of disks, and in the process, we make interesting connections between our work and the literature on local search for geometric optimization problems.
The Extraction Theorem helps to design the first polynomial-time O(1)-approximations for two important geometric covering problems involving disks.

Sayan Bandyapadhyay, Anil Maheshwari, Sasanka Roy, Michiel Smid, and Kasturi Varadarajan. Geometric Covering via Extraction Theorem. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.ITCS.2024.7, author = {Bandyapadhyay, Sayan and Maheshwari, Anil and Roy, Sasanka and Smid, Michiel and Varadarajan, Kasturi}, title = {{Geometric Covering via Extraction Theorem}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {7:1--7:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.7}, URN = {urn:nbn:de:0030-drops-195355}, doi = {10.4230/LIPIcs.ITCS.2024.7}, annote = {Keywords: Covering, Extraction theorem, Double-disks, Submodularity, Local search} }

Document

**Published in:** LIPIcs, Volume 258, 39th International Symposium on Computational Geometry (SoCG 2023)

Designing coresets - small-space sketches of the data preserving cost of the solutions within (1± ε)-approximate factor - is an important research direction in the study of center-based k-clustering problems, such as k-means or k-median. Feldman and Langberg [STOC'11] have shown that for k-clustering of n points in general metrics, it is possible to obtain coresets whose size depends logarithmically in n. Moreover, such a dependency in n is inevitable in general metrics. A significant amount of recent work in the area is devoted to obtaining coresests whose sizes are independent of n for special metrics, like d-dimensional Euclidean space [Huang, Vishnoi, STOC'20], doubling metrics [Huang, Jiang, Li, Wu, FOCS'18], metrics of graphs of bounded treewidth [Baker, Braverman, Huang, Jiang, Krauthgamer, Wu, ICML’20], or graphs excluding a fixed minor [Braverman, Jiang, Krauthgamer, Wu, SODA’21].
In this paper, we provide the first constructions of coresets whose size does not depend on n for k-clustering in the metrics induced by geometric intersection graphs. For example, we obtain (k log²k)/ε^𝒪(1) size coresets for k-clustering in Euclidean-weighted unit-disk graphs (UDGs) and unit-square graphs (USGs). These constructions follow from a general theorem that identifies two canonical properties of a graph metric sufficient for obtaining coresets whose size is independent of n. The proof of our theorem builds on the recent work of Cohen-Addad, Saulpic, and Schwiegelshohn [STOC '21], which ensures small-sized coresets conditioned on the existence of an interesting set of centers, called centroid set. The main technical contribution of our work is the proof of the existence of such a small-sized centroid set for graphs that satisfy the two canonical properties. Loosely speaking, the metrics of geometric intersection graphs are "similar" to the Euclidean metrics for points that are close, and to the shortest path metrics of planar graphs for points that are far apart. The main technical challenge in constructing centroid sets of small sizes is in combining these two very different metrics.
The new coreset construction helps to design the first (1+ε)-approximation for center-based clustering problems in UDGs and USGs, that is fixed-parameter tractable in k and ε (FPT-AS).

Sayan Bandyapadhyay, Fedor V. Fomin, and Tanmay Inamdar. Coresets for Clustering in Geometric Intersection Graphs. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 10:1-10:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2023.10, author = {Bandyapadhyay, Sayan and Fomin, Fedor V. and Inamdar, Tanmay}, title = {{Coresets for Clustering in Geometric Intersection Graphs}}, booktitle = {39th International Symposium on Computational Geometry (SoCG 2023)}, pages = {10:1--10:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-273-0}, ISSN = {1868-8969}, year = {2023}, volume = {258}, editor = {Chambers, Erin W. and Gudmundsson, Joachim}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2023.10}, URN = {urn:nbn:de:0030-drops-178605}, doi = {10.4230/LIPIcs.SoCG.2023.10}, annote = {Keywords: k-median, k-means, clustering, coresets, geometric graphs} }

Document

**Published in:** LIPIcs, Volume 258, 39th International Symposium on Computational Geometry (SoCG 2023)

We revisit a natural variant of the geometric set cover problem, called minimum-membership geometric set cover (MMGSC). In this problem, the input consists of a set S of points and a set ℛ of geometric objects, and the goal is to find a subset ℛ^* ⊆ ℛ to cover all points in S such that the membership of S with respect to ℛ^*, denoted by memb(S,ℛ^*), is minimized, where memb(S,ℛ^*) = max_{p ∈ S} |{R ∈ ℛ^*: p ∈ R}|. We give the first polynomial-time approximation algorithms for MMGSC in ℝ². Specifically, we achieve the following two main results.
- We give the first polynomial-time constant-approximation algorithm for MMGSC with unit squares. This answers a question left open since the work of Erlebach and Leeuwen [SODA'08], who gave a constant-approximation algorithm with running time n^{O(opt)} where opt is the optimum of the problem (i.e., the minimum membership).
- We give the first polynomial-time approximation scheme (PTAS) for MMGSC with halfplanes. Prior to this work, it was even unknown whether the problem can be approximated with a factor of o(log n) in polynomial time, while it is well-known that the minimum-size set cover problem with halfplanes can be solved in polynomial time. We also consider a problem closely related to MMGSC, called minimum-ply geometric set cover (MPGSC), in which the goal is to find ℛ^* ⊆ ℛ to cover S such that the ply of ℛ^* is minimized, where the ply is defined as the maximum number of objects in ℛ^* which have a nonempty common intersection. Very recently, Durocher et al. gave the first constant-approximation algorithm for MPGSC with unit squares which runs in O(n^{12}) time. We give a significantly simpler constant-approximation algorithm with near-linear running time.

Sayan Bandyapadhyay, William Lochet, Saket Saurabh, and Jie Xue. Minimum-Membership Geometric Set Cover, Revisited. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2023.11, author = {Bandyapadhyay, Sayan and Lochet, William and Saurabh, Saket and Xue, Jie}, title = {{Minimum-Membership Geometric Set Cover, Revisited}}, booktitle = {39th International Symposium on Computational Geometry (SoCG 2023)}, pages = {11:1--11:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-273-0}, ISSN = {1868-8969}, year = {2023}, volume = {258}, editor = {Chambers, Erin W. and Gudmundsson, Joachim}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2023.11}, URN = {urn:nbn:de:0030-drops-178610}, doi = {10.4230/LIPIcs.SoCG.2023.11}, annote = {Keywords: geometric set cover, geometric optimization, approximation algorithms} }

Document

**Published in:** LIPIcs, Volume 258, 39th International Symposium on Computational Geometry (SoCG 2023)

Clustering with capacity constraints is a fundamental problem that attracted significant attention throughout the years. In this paper, we give the first FPT constant-factor approximation algorithm for the problem of clustering points in a general metric into k clusters to minimize the sum of cluster radii, subject to non-uniform hard capacity constraints (Capacitated Sum of Radii ). In particular, we give a (15+ε)-approximation algorithm that runs in 2^𝒪(k²log k) ⋅ n³ time.
When capacities are uniform, we obtain the following improved approximation bounds.
- A (4 + ε)-approximation with running time 2^𝒪(klog(k/ε)) n³, which significantly improves over the FPT 28-approximation of Inamdar and Varadarajan [ESA 2020].
- A (2 + ε)-approximation with running time 2^𝒪(k/ε² ⋅log(k/ε)) dn³ and a (1+ε)-approxim- ation with running time 2^𝒪(kdlog ((k/ε))) n³ in the Euclidean space. Here d is the dimension.
- A (1 + ε)-approximation in the Euclidean space with running time 2^𝒪(k/ε² ⋅log(k/ε)) dn³ if we are allowed to violate the capacities by (1 + ε)-factor. We complement this result by showing that there is no (1 + ε)-approximation algorithm running in time f(k)⋅ n^𝒪(1), if any capacity violation is not allowed.

Sayan Bandyapadhyay, William Lochet, and Saket Saurabh. FPT Constant-Approximations for Capacitated Clustering to Minimize the Sum of Cluster Radii. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2023.12, author = {Bandyapadhyay, Sayan and Lochet, William and Saurabh, Saket}, title = {{FPT Constant-Approximations for Capacitated Clustering to Minimize the Sum of Cluster Radii}}, booktitle = {39th International Symposium on Computational Geometry (SoCG 2023)}, pages = {12:1--12:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-273-0}, ISSN = {1868-8969}, year = {2023}, volume = {258}, editor = {Chambers, Erin W. and Gudmundsson, Joachim}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2023.12}, URN = {urn:nbn:de:0030-drops-178628}, doi = {10.4230/LIPIcs.SoCG.2023.12}, annote = {Keywords: Clustering, FPT-approximation} }

Document

**Published in:** LIPIcs, Volume 249, 17th International Symposium on Parameterized and Exact Computation (IPEC 2022)

In this paper, we consider the Minimum-Load k-Clustering/Facility Location (MLkC) problem where we are given a set P of n points in a metric space that we have to cluster and an integer k > 0 that denotes the number of clusters. Additionally, we are given a set F of cluster centers in the same metric space. The goal is to select a set C ⊆ F of k centers and assign each point in P to a center in C, such that the maximum load over all centers is minimized. Here the load of a center is the sum of the distances between it and the points assigned to it.
Although clustering/facility location problems have rich literature, the minimum-load objective has not been studied substantially, and hence MLkC has remained a poorly understood problem. More interestingly, the problem is notoriously hard even in some special cases including the one in line metrics as shown by Ahmadian et al. [APPROX 2014, ACM Trans. Algorithms 2018]. They also show APX-hardness of the problem in the plane. On the other hand, the best-known approximation factor for MLkC is O(k), even in the plane.
In this work, we study a fair version of MLkC inspired by the work of Chierichetti et al. [NeurIPS, 2017]. Here the input points are partitioned into 𝓁 protected groups, and only clusters that proportionally represent each group are allowed. MLkC is the special case with 𝓁 = 1. For the fair version, we are able to obtain a randomized 3-approximation algorithm in f(k,𝓁)⋅ n^O(1) time. Also, our scheme leads to an improved (1 + ε)-approximation in the case of Euclidean norm with the same running time (depending also linearly on the dimension d). Our results imply the same approximations for MLkC with running time f(k)⋅ n^O(1), achieving the first constant-factor FPT approximations for this problem in general and Euclidean metric spaces.

Sayan Bandyapadhyay, Fedor V. Fomin, Petr A. Golovach, Nidhi Purohit, and Kirill Simonov. FPT Approximation for Fair Minimum-Load Clustering. In 17th International Symposium on Parameterized and Exact Computation (IPEC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 249, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.IPEC.2022.4, author = {Bandyapadhyay, Sayan and Fomin, Fedor V. and Golovach, Petr A. and Purohit, Nidhi and Simonov, Kirill}, title = {{FPT Approximation for Fair Minimum-Load Clustering}}, booktitle = {17th International Symposium on Parameterized and Exact Computation (IPEC 2022)}, pages = {4:1--4:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-260-0}, ISSN = {1868-8969}, year = {2022}, volume = {249}, editor = {Dell, Holger and Nederlof, Jesper}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2022.4}, URN = {urn:nbn:de:0030-drops-173600}, doi = {10.4230/LIPIcs.IPEC.2022.4}, annote = {Keywords: fair clustering, load balancing, parameterized approximation} }

Document

**Published in:** LIPIcs, Volume 224, 38th International Symposium on Computational Geometry (SoCG 2022)

We prove a structural theorem for unit-disk graphs, which (roughly) states that given a set 𝒟 of n unit disks inducing a unit-disk graph G_𝒟 and a number p ∈ [n], one can partition 𝒟 into p subsets 𝒟₁,… ,𝒟_p such that for every i ∈ [p] and every 𝒟' ⊆ 𝒟_i, the graph obtained from G_𝒟 by contracting all edges between the vertices in 𝒟_i $1𝒟' admits a tree decomposition in which each bag consists of O(p+|𝒟'|) cliques. Our theorem can be viewed as an analog for unit-disk graphs of the structural theorems for planar graphs and almost-embeddable graphs proved very recently by Marx et al. [SODA'22] and Bandyapadhyay et al. [SODA'22].
By applying our structural theorem, we give several new combinatorial and algorithmic results for unit-disk graphs. On the combinatorial side, we obtain the first Contraction Decomposition Theorem (CDT) for unit-disk graphs, resolving an open question in the work Panolan et al. [SODA'19]. On the algorithmic side, we obtain a new FPT algorithm for bipartization (also known as odd cycle transversal) on unit-disk graphs, which runs in 2^{O(√k log k)} ⋅ n^{O(1)} time, where k denotes the solution size. Our algorithm significantly improves the previous slightly subexponential-time algorithm given by Lokshtanov et al. [SODA'22] (which works more generally for disk graphs) and is almost optimal, as the problem cannot be solved in 2^{o(√k)} ⋅ n^{O(1)} time assuming the ETH.

Sayan Bandyapadhyay, William Lochet, Daniel Lokshtanov, Saket Saurabh, and Jie Xue. True Contraction Decomposition and Almost ETH-Tight Bipartization for Unit-Disk Graphs. In 38th International Symposium on Computational Geometry (SoCG 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 224, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2022.11, author = {Bandyapadhyay, Sayan and Lochet, William and Lokshtanov, Daniel and Saurabh, Saket and Xue, Jie}, title = {{True Contraction Decomposition and Almost ETH-Tight Bipartization for Unit-Disk Graphs}}, booktitle = {38th International Symposium on Computational Geometry (SoCG 2022)}, pages = {11:1--11:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-227-3}, ISSN = {1868-8969}, year = {2022}, volume = {224}, editor = {Goaoc, Xavier and Kerber, Michael}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2022.11}, URN = {urn:nbn:de:0030-drops-160190}, doi = {10.4230/LIPIcs.SoCG.2022.11}, annote = {Keywords: unit-disk graphs, tree decomposition, contraction decomposition, bipartization} }

Document

**Published in:** LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)

Given two sets S and T of points in the plane, of total size n, a many-to-many matching between S and T is a set of pairs (p,q) such that p ∈ S, q ∈ T and for each r ∈ S ∪ T, r appears in at least one such pair. The cost of a pair (p,q) is the (Euclidean) distance between p and q. In the minimum-cost many-to-many matching problem, the goal is to compute a many-to-many matching such that the sum of the costs of the pairs is minimized. This problem is a restricted version of minimum-weight edge cover in a bipartite graph, and hence can be solved in O(n³) time. In a more restricted setting where all the points are on a line, the problem can be solved in O(nlog n) time [Justin Colannino et al., 2007]. However, no progress has been made in the general planar case in improving the cubic time bound. In this paper, we obtain an O(n²⋅ poly(log n)) time exact algorithm and an O(n^{3/2}⋅ poly(log n)) time (1+ε)-approximation in the planar case.

Sayan Bandyapadhyay, Anil Maheshwari, and Michiel Smid. Exact and Approximation Algorithms for Many-To-Many Point Matching in the Plane. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 44:1-44:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.ISAAC.2021.44, author = {Bandyapadhyay, Sayan and Maheshwari, Anil and Smid, Michiel}, title = {{Exact and Approximation Algorithms for Many-To-Many Point Matching in the Plane}}, booktitle = {32nd International Symposium on Algorithms and Computation (ISAAC 2021)}, pages = {44:1--44:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-214-3}, ISSN = {1868-8969}, year = {2021}, volume = {212}, editor = {Ahn, Hee-Kap and Sadakane, Kunihiko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2021.44}, URN = {urn:nbn:de:0030-drops-154779}, doi = {10.4230/LIPIcs.ISAAC.2021.44}, annote = {Keywords: Many-to-many matching, bipartite, planar, geometric, approximation} }

Document

**Published in:** LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature selection methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers l (the number of irrelevant features) and k (the number of clusters), budget B, and a set of n categorical data points (represented by m-dimensional vectors whose elements belong to a finite set of values Σ), we want to select m-l relevant features such that the cost of any optimal k-clustering on these features does not exceed B. Here the cost of a cluster is the sum of Hamming distances (l0-distances) between the selected features of the elements of the cluster and its center. The clustering cost is the total sum of the costs of the clusters.
We use the framework of parameterized complexity to identify how the complexity of the problem depends on parameters k, B, and |Σ|. Our main result is an algorithm that solves the Feature Selection problem in time f(k,B,|Σ|)⋅m^{g(k,|Σ|)}⋅n² for some functions f and g. In other words, the problem is fixed-parameter tractable parameterized by B when |Σ| and k are constants. Our algorithm for Feature Selection is based on a solution to a more general problem, Constrained Clustering with Outliers. In this problem, we want to delete a certain number of outliers such that the remaining points could be clustered around centers satisfying specific constraints. One interesting fact about Constrained Clustering with Outliers is that besides Feature Selection, it encompasses many other fundamental problems regarding categorical data such as Robust Clustering, Binary and Boolean Low-rank Matrix Approximation with Outliers, and Binary Robust Projective Clustering. Thus as a byproduct of our theorem, we obtain algorithms for all these problems. We also complement our algorithmic findings with complexity lower bounds.

Sayan Bandyapadhyay, Fedor V. Fomin, Petr A. Golovach, and Kirill Simonov. Parameterized Complexity of Feature Selection for Categorical Data Clustering. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 14:1-14:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.MFCS.2021.14, author = {Bandyapadhyay, Sayan and Fomin, Fedor V. and Golovach, Petr A. and Simonov, Kirill}, title = {{Parameterized Complexity of Feature Selection for Categorical Data Clustering}}, booktitle = {46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)}, pages = {14:1--14:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-201-3}, ISSN = {1868-8969}, year = {2021}, volume = {202}, editor = {Bonchi, Filippo and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.14}, URN = {urn:nbn:de:0030-drops-144544}, doi = {10.4230/LIPIcs.MFCS.2021.14}, annote = {Keywords: Robust clustering, PCA, Low rank approximation, Hypergraph enumeration} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Fair clustering is a variant of constrained clustering where the goal is to partition a set of colored points. The fraction of points of each color in every cluster should be more or less equal to the fraction of points of this color in the dataset. This variant was recently introduced by Chierichetti et al. [NeurIPS 2017] and became widely popular. This paper proposes a new construction of coresets for fair k-means and k-median clustering for Euclidean and general metrics based on random sampling. For the Euclidean space ℝ^d, we provide the first coresets whose size does not depend exponentially on the dimension d. The question of whether such constructions exist was asked by Schmidt, Schwiegelshohn, and Sohler [WAOA 2019] and Huang, Jiang, and Vishnoi [NeurIPS 2019]. For general metric, our construction provides the first coreset for fair k-means and k-median.
New coresets appear to be a handy tool for designing better approximation and streaming algorithms for fair and other constrained clustering variants. In particular, we obtain
- the first fixed-parameter tractable (FPT) PTAS for fair k-means and k-median clustering in ℝ^d. The near-linear time of our PTAS improves over the previous scheme of Böhm, Fazzone, Leonardi, and Schwiegelshohn [ArXiv 2020] with running time n^{poly(k/ε)};
- FPT "true" constant-approximation for metric fair clustering. All previous algorithms for fair k-means and k-median in general metric are bicriteria and violate the fairness constraints;
- FPT 3-approximation for lower-bounded k-median improving the best-known 3.736 factor of Bera, Chakrabarty, and Negahbani [ArXiv 2019];
- the first FPT constant-approximations for metric chromatic clustering and 𝓁-Diversity clustering;
- near linear-time (in n) PTAS for capacitated and lower-bounded clustering improving over PTAS of Bhattacharya, Jaiswal, and Kumar [TOCS 2018] with super-quadratic running time;
- a streaming (1+ε)-approximation for fair k-means and k-median of space complexity polynomial in k, d, ε and log{n} (the previous algorithms have exponential space complexity on either d or k).

Sayan Bandyapadhyay, Fedor V. Fomin, and Kirill Simonov. On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.ICALP.2021.23, author = {Bandyapadhyay, Sayan and Fomin, Fedor V. and Simonov, Kirill}, title = {{On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications}}, booktitle = {48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)}, pages = {23:1--23:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-195-5}, ISSN = {1868-8969}, year = {2021}, volume = {198}, editor = {Bansal, Nikhil and Merelli, Emanuela and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.23}, URN = {urn:nbn:de:0030-drops-140923}, doi = {10.4230/LIPIcs.ICALP.2021.23}, annote = {Keywords: fair clustering, coresets, approximation algorithms} }

Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

In the Metric Capacitated Covering (MCC) problem, given a set of balls ℬ in a metric space P with metric d and a capacity parameter U, the goal is to find a minimum sized subset ℬ' ⊆ ℬ and an assignment of the points in P to the balls in ℬ' such that each point is assigned to a ball that contains it and each ball is assigned with at most U points. MCC achieves an O(log |P|)-approximation using a greedy algorithm. On the other hand, it is hard to approximate within a factor of o(log |P|) even with β < 3 factor expansion of the balls. Bandyapadhyay et al. [SoCG 2018, DCG 2019] showed that one can obtain an O(1)-approximation for the problem with 6.47 factor expansion of the balls. An open question left by their work is to reduce the gap between the lower bound 3 and the upper bound 6.47. In this current work, we show that it is possible to obtain an O(1)-approximation with only 4.24 factor expansion of the balls. We also show a similar upper bound of 5 for a more generalized version of MCC for which the best previously known bound was 9.

Sayan Bandyapadhyay. Improved Bounds for Metric Capacitated Covering Problems. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 9:1-9:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay:LIPIcs.ESA.2020.9, author = {Bandyapadhyay, Sayan}, title = {{Improved Bounds for Metric Capacitated Covering Problems}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {9:1--9:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.9}, URN = {urn:nbn:de:0030-drops-128759}, doi = {10.4230/LIPIcs.ESA.2020.9}, annote = {Keywords: Capacitated covering, approximation algorithms, bicriteria approximation, LP rounding} }

Document

APPROX

**Published in:** LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

The Non-Uniform k-center (NUkC) problem has recently been formulated by Chakrabarty, Goyal and Krishnaswamy [ICALP, 2016] as a generalization of the classical k-center clustering problem. In NUkC, given a set of n points P in a metric space and non-negative numbers r₁, r₂, … , r_k, the goal is to find the minimum dilation α and to choose k balls centered at the points of P with radius α⋅ r_i for 1 ≤ i ≤ k, such that all points of P are contained in the union of the chosen balls. They showed that the problem is NP-hard to approximate within any factor even in tree metrics. On the other hand, they designed a "bi-criteria" constant approximation algorithm that uses a constant times k balls. Surprisingly, no true approximation is known even in the special case when the r_i’s belong to a fixed set of size 3. In this paper, we study the NUkC problem under perturbation resilience, which was introduced by Bilu and Linial [Combinatorics, Probability and Computing, 2012]. We show that the problem under 2-perturbation resilience is polynomial time solvable when the r_i’s belong to a constant sized set. However, we show that perturbation resilience does not help in the general case. In particular, our findings imply that even with perturbation resilience one cannot hope to find any "good" approximation for the problem.

Sayan Bandyapadhyay. On Perturbation Resilience of Non-Uniform k-Center. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 31:1-31:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay:LIPIcs.APPROX/RANDOM.2020.31, author = {Bandyapadhyay, Sayan}, title = {{On Perturbation Resilience of Non-Uniform k-Center}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)}, pages = {31:1--31:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-164-1}, ISSN = {1868-8969}, year = {2020}, volume = {176}, editor = {Byrka, Jaros{\l}aw and Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.31}, URN = {urn:nbn:de:0030-drops-126347}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.31}, annote = {Keywords: Non-Uniform k-center, stability, clustering, perturbation resilience} }

Document

**Published in:** LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

In this paper, we consider the colorful k-center problem, which is a generalization of the well-known k-center problem. Here, we are given red and blue points in a metric space, and a coverage requirement for each color. The goal is to find the smallest radius rho, such that with k balls of radius rho, the desired number of points of each color can be covered. We obtain a constant approximation for this problem in the Euclidean plane. We obtain this result by combining a "pseudo-approximation" algorithm that works in any metric space, and an approximation algorithm that works for a special class of instances in the plane. The latter algorithm uses a novel connection to a certain matching problem in graphs.

Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, and Kasturi Varadarajan. A Constant Approximation for Colorful k-Center. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.ESA.2019.12, author = {Bandyapadhyay, Sayan and Inamdar, Tanmay and Pai, Shreyas and Varadarajan, Kasturi}, title = {{A Constant Approximation for Colorful k-Center}}, booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)}, pages = {12:1--12:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-124-5}, ISSN = {1868-8969}, year = {2019}, volume = {144}, editor = {Bender, Michael A. and Svensson, Ola and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.12}, URN = {urn:nbn:de:0030-drops-111336}, doi = {10.4230/LIPIcs.ESA.2019.12}, annote = {Keywords: Colorful k-center, Euclidean Plane, LP Rounding, Outliers} }

Document

**Published in:** LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)

We consider the Minimum Dominating Set (MDS) problem on the intersection graphs of geometric objects. Even for simple and widely-used geometric objects such as rectangles, no sub-logarithmic approximation is known for the problem and (perhaps surprisingly) the problem is NP-hard even when all the rectangles are "anchored" at a diagonal line with slope -1 (Pandit, CCCG 2017). In this paper, we first show that for any epsilon>0, there exists a (2+epsilon)-approximation algorithm for the MDS problem on "diagonal-anchored" rectangles, providing the first O(1)-approximation for the problem on a non-trivial subclass of rectangles. It is not hard to see that the MDS problem on "diagonal-anchored" rectangles is the same as the MDS problem on "diagonal-anchored" L-frames: the union of a vertical and a horizontal line segment that share an endpoint. As such, we also obtain a (2+epsilon)-approximation for the problem with "diagonal-anchored" L-frames. On the other hand, we show that the problem is APX-hard in case the input L-frames intersect the diagonal, or the horizontal segments of the L-frames intersect a vertical line. However, as we show, the problem is linear-time solvable in case the L-frames intersect a vertical as well as a horizontal line. Finally, we consider the MDS problem in the so-called "edge intersection model" and obtain a number of results, answering two questions posed by Mehrabi (WAOA 2017).

Sayan Bandyapadhyay, Anil Maheshwari, Saeed Mehrabi, and Subhash Suri. Approximating Dominating Set on Intersection Graphs of Rectangles and L-frames. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 37:1-37:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.MFCS.2018.37, author = {Bandyapadhyay, Sayan and Maheshwari, Anil and Mehrabi, Saeed and Suri, Subhash}, title = {{Approximating Dominating Set on Intersection Graphs of Rectangles and L-frames}}, booktitle = {43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)}, pages = {37:1--37:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-086-6}, ISSN = {1868-8969}, year = {2018}, volume = {117}, editor = {Potapov, Igor and Spirakis, Paul and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.37}, URN = {urn:nbn:de:0030-drops-96198}, doi = {10.4230/LIPIcs.MFCS.2018.37}, annote = {Keywords: Minimum dominating set, Rectangles and L-frames, Approximation schemes, Local search, APX-hardness} }

Document

**Published in:** LIPIcs, Volume 116, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)

In the minimum constraint removal problem, we are given a set of geometric objects as obstacles in the plane, and we want to find the minimum number of obstacles that must be removed to reach a target point t from the source point s by an obstacle-free path. The problem is known to be intractable, and (perhaps surprisingly) no sub-linear approximations are known even for simple obstacles such as rectangles and disks. The main result of our paper is a new approximation technique that gives O(sqrt{n})-approximation for rectangles, disks as well as rectilinear polygons. The technique also gives O(sqrt{n})-approximation for the minimum color path problem in graphs. We also present some inapproximability results for the geometric constraint removal problem.

Sayan Bandyapadhyay, Neeraj Kumar, Subhash Suri, and Kasturi Varadarajan. Improved Approximation Bounds for the Minimum Constraint Removal Problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 2:1-2:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.APPROX-RANDOM.2018.2, author = {Bandyapadhyay, Sayan and Kumar, Neeraj and Suri, Subhash and Varadarajan, Kasturi}, title = {{Improved Approximation Bounds for the Minimum Constraint Removal Problem}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)}, pages = {2:1--2:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-085-9}, ISSN = {1868-8969}, year = {2018}, volume = {116}, editor = {Blais, Eric and Jansen, Klaus and D. P. Rolim, Jos\'{e} and Steurer, David}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2018.2}, URN = {urn:nbn:de:0030-drops-94066}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2018.2}, annote = {Keywords: Minimum Constraint Removal, Minimum Color Path, Barrier Resilience, Obstacle Removal, Obstacle Free Path, Approximation} }

Document

**Published in:** LIPIcs, Volume 99, 34th International Symposium on Computational Geometry (SoCG 2018)

In this article, we consider the following capacitated covering problem. We are given a set P of n points and a set B of balls from some metric space, and a positive integer U that represents the capacity of each of the balls in B. We would like to compute a subset B' subseteq B of balls and assign each point in P to some ball in B' that contains it, such that the number of points assigned to any ball is at most U. The objective function that we would like to minimize is the cardinality of B'.
We consider this problem in arbitrary metric spaces as well as Euclidean spaces of constant dimension. In the metric setting, even the uncapacitated version of the problem is hard to approximate to within a logarithmic factor. In the Euclidean setting, the best known approximation guarantee in dimensions 3 and higher is logarithmic in the number of points. Thus we focus on obtaining "bi-criteria" approximations. In particular, we are allowed to expand the balls in our solution by some factor, but optimal solutions do not have that flexibility. Our main result is that allowing constant factor expansion of the input balls suffices to obtain constant approximations for this problem. In fact, in the Euclidean setting, only (1+epsilon) factor expansion is sufficient for any epsilon > 0, with the approximation factor being a polynomial in 1/epsilon. We obtain these results using a unified scheme for rounding the natural LP relaxation; this scheme may be useful for other capacitated covering problems. We also complement these bi-criteria approximations by obtaining hardness of approximation results that shed light on our understanding of these problems.

Sayan Bandyapadhyay, Santanu Bhowmick, Tanmay Inamdar, and Kasturi Varadarajan. Capacitated Covering Problems in Geometric Spaces. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 7:1-7:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2018.7, author = {Bandyapadhyay, Sayan and Bhowmick, Santanu and Inamdar, Tanmay and Varadarajan, Kasturi}, title = {{Capacitated Covering Problems in Geometric Spaces}}, booktitle = {34th International Symposium on Computational Geometry (SoCG 2018)}, pages = {7:1--7:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-066-8}, ISSN = {1868-8969}, year = {2018}, volume = {99}, editor = {Speckmann, Bettina and T\'{o}th, Csaba D.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2018.7}, URN = {urn:nbn:de:0030-drops-87205}, doi = {10.4230/LIPIcs.SoCG.2018.7}, annote = {Keywords: Capacitated covering, Geometric set cover, LP rounding, Bi-criteria approximation} }

Document

**Published in:** LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

In this paper we consider two metric covering/clustering problems - Minimum Cost Covering Problem (MCC) and k-clustering. In the MCC problem, we are given two point sets X (clients) and Y (servers), and a metric on X cup Y. We would like to cover the clients by balls centered at the servers. The objective function to minimize is the sum of the alpha-th power of the radii of the balls. Here alpha geq 1 is a parameter of the problem (but not of a problem instance). MCC is closely related to the k-clustering problem. The main difference between k-clustering and MCC is that in k-clustering one needs to select k balls to cover the clients.
For any eps > 0, we describe quasi-polynomial time (1 + eps) approximation algorithms for both of the problems. However, in case of k-clustering the algorithm uses (1 + eps)k balls. Prior to our work, a 3^alpha and a c^alpha approximation were achieved by polynomial-time algorithms for MCC and k-clustering, respectively, where c > 1 is an absolute constant. These two problems are thus interesting examples of metric covering/clustering problems that admit (1 + eps)-approximation (using (1 + eps)k balls in case of k-clustering), if one is willing to settle for quasi-polynomial time. In contrast, for the variant of MCC where alpha is part of the input, we show under standard assumptions that no polynomial time algorithm can achieve an approximation factor better than O(log |X|) for alpha geq log |X|.

Sayan Bandyapadhyay and Kasturi Varadarajan. Approximate Clustering via Metric Partitioning. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.ISAAC.2016.15, author = {Bandyapadhyay, Sayan and Varadarajan, Kasturi}, title = {{Approximate Clustering via Metric Partitioning}}, booktitle = {27th International Symposium on Algorithms and Computation (ISAAC 2016)}, pages = {15:1--15:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-026-2}, ISSN = {1868-8969}, year = {2016}, volume = {64}, editor = {Hong, Seok-Hee}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.15}, URN = {urn:nbn:de:0030-drops-67751}, doi = {10.4230/LIPIcs.ISAAC.2016.15}, annote = {Keywords: Approximation Algorithms, Clustering, Covering, Probabilistic Parti- tions} }

Document

**Published in:** LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

Clustering problems often arise in fields like data mining and machine learning. Clustering usually refers to the task of partitioning a collection of objects into groups with similar elements, with respect to a similarity (or dissimilarity) measure. Among the clustering problems, k-means clustering in particular has received much attention from researchers. Despite the fact that k-means is a well studied problem, its status in the plane is still open. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound achievable in polynomial time is 9+epsilon.
In this paper, we consider the following variant of k-means. Given a set C of points in R^d and a real f > 0, find a finite set F of points in R^d that minimizes the quantity f*|F|+sum_{p in C} min_{q in F} {||p-q||}^2. For any fixed dimension d, we design a PTAS for this problem that is based on local search. We also give a "bi-criterion" local search algorithm for k-means which uses (1+epsilon)k centers and yields a solution whose cost is at most (1+epsilon) times the cost of an optimal k-means solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is two-fold. On the one hand, we are able to handle the square of distances in an elegant manner, obtaining a near-optimal approximation bound. This leads us towards a better understanding of the k-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that little is known about the local search method for geometric approximation.

Sayan Bandyapadhyay and Kasturi Varadarajan. On Variants of k-means Clustering. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 14:1-14:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{bandyapadhyay_et_al:LIPIcs.SoCG.2016.14, author = {Bandyapadhyay, Sayan and Varadarajan, Kasturi}, title = {{On Variants of k-means Clustering}}, booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)}, pages = {14:1--14:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-009-5}, ISSN = {1868-8969}, year = {2016}, volume = {51}, editor = {Fekete, S\'{a}ndor and Lubiw, Anna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.14}, URN = {urn:nbn:de:0030-drops-59061}, doi = {10.4230/LIPIcs.SoCG.2016.14}, annote = {Keywords: k-means, Facility location, Local search, Geometric approximation} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail