DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.29

Guessing Efficiently for Constrained Subspace Approximation

Authors: Aditya Bhaskara, Sepideh Mahabadi, Madhusudhan Reddy Pittu, Ali Vakilian, and David P. Woodruff

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

In this paper we study constrained subspace approximation problem. Given a set of n points {a₁,…,a_n} in ℝ^d, the goal of the subspace approximation problem is to find a k dimensional subspace that best approximates the input points. More precisely, for a given p ≥ 1, we aim to minimize the pth power of the 𝓁_p norm of the error vector (‖a₁-Pa₁‖,…,‖a_n-Pa_n‖), where P denotes the projection matrix onto the subspace and the norms are Euclidean. In constrained subspace approximation (CSA), we additionally have constraints on the projection matrix P. In its most general form, we require P to belong to a given subset 𝒮 that is described explicitly or implicitly. We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either (1+ε)-multiplicative or ε-additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to fair subspace approximation, k-means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for k-means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.

Cite as

Aditya Bhaskara, Sepideh Mahabadi, Madhusudhan Reddy Pittu, Ali Vakilian, and David P. Woodruff. Guessing Efficiently for Constrained Subspace Approximation. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 29:1-29:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bhaskara_et_al:LIPIcs.ICALP.2025.29,
  author =	{Bhaskara, Aditya and Mahabadi, Sepideh and Pittu, Madhusudhan Reddy and Vakilian, Ali and Woodruff, David P.},
  title =	{{Guessing Efficiently for Constrained Subspace Approximation}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{29:1--29:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.29},
  URN =		{urn:nbn:de:0030-drops-234068},
  doi =		{10.4230/LIPIcs.ICALP.2025.29},
  annote =	{Keywords: parameterized complexity, low rank approximation, fairness, non-negative matrix factorization, clustering}
}

@InProceedings{bhaskara_et_al:LIPIcs.ICALP.2025.29,
  author =	{Bhaskara, Aditya and Mahabadi, Sepideh and Pittu, Madhusudhan Reddy and Vakilian, Ali and Woodruff, David P.},
  title =	{{Guessing Efficiently for Constrained Subspace Approximation}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{29:1--29:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.29},
  URN =		{urn:nbn:de:0030-drops-234068},
  doi =		{10.4230/LIPIcs.ICALP.2025.29},
  annote =	{Keywords: parameterized complexity, low rank approximation, fairness, non-negative matrix factorization, clustering}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.101

Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case

Authors: Shaofeng H.-C. Jiang and Jianing Lou

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We devise ε-coresets for robust (k,z)-Clustering with m outliers through black-box reductions to vanilla clustering. Given an ε-coreset construction for vanilla clustering with size N, we construct coresets of size N⋅ polylog(kmε^{-1}) + O_z(min{kmε^{-1}, m ε^{-2z}log^z(kmε^{-1})}) for various metric spaces, where O_z hides 2^{O(zlog z)} factors. This increases the size of the vanilla coreset by a small multiplicative factor of polylog(kmε^{-1}), and the additive term is up to a (ε^{-1}log (km))^{O(z)} factor to the size of the optimal robust coreset. Plugging in recent vanilla coreset results of [Cohen-Addad, Saulpic and Schwiegelshohn, STOC'21; Cohen-Addad, Draganov, Russo, Saulpic and Schwiegelshohn, SODA'25], we obtain the first coresets for (k,z)-Clustering with m outliers with size near-linear in k while previous results have size at least Ω(k²) [Huang, Jiang, Lou and Wu, ICLR'23; Huang, Li, Lu and Wu, SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only O_z(km ε^{-1}) non-dense points to obtain the O_z(km ε^{-1}) additive bound. Another sufficient condition requires the vanilla coreset to possess an extra size-preserving property. To utilize this condition, we further give a black-box reduction that turns a vanilla coreset to the one that satisfies the said size-preserving property, and this leads to the alternative O_z(mε^{-2z}log^{z}(kmε^{-1})) additive size bound. We also give low-space implementations of our reductions in the dynamic streaming setting. Combined with known streaming constructions for vanilla coresets [Braverman, Frahling, Lang, Sohler and Yang, ICML'17; Hu, Song, Yang and Zhong, arXiv'1802.00459], we obtain the first dynamic streaming algorithms for coresets for k-Median (and k-Means) with m outliers, using space Õ(k + m) ⋅ poly(dε^{-1}log Δ) for inputs on a discrete grid [Δ]^d.

Cite as

Shaofeng H.-C. Jiang and Jianing Lou. Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 101:1-101:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2025.101,
  author =	{Jiang, Shaofeng H.-C. and Lou, Jianing},
  title =	{{Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{101:1--101:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.101},
  URN =		{urn:nbn:de:0030-drops-234781},
  doi =		{10.4230/LIPIcs.ICALP.2025.101},
  annote =	{Keywords: Coresets, clustering, outliers, streaming algorithms}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.62

On Approximability of 𝓁₂² Min-Sum Clustering

Authors: Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

The 𝓁₂² min-sum k-clustering problem is to partition an input set into clusters C_1,…,C_k to minimize ∑_{i=1}^k ∑_{p,q ∈ C_i} ‖p-q‖₂². Although 𝓁₂² min-sum k-clustering is NP-hard, it is not known whether it is NP-hard to approximate 𝓁₂² min-sum k-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the 𝓁₂² min-sum k-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than 1.056 and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a fast PTAS for 𝓁₂² min-sum k-clustering. Specifically, our algorithm runs in time O(n^{1+o(1)}d⋅ 2^{(k/ε)^O(1)}), which is the first nearly linear time algorithm for this problem. We also consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label i ∈ [k] for input point, thereby implicitly partitioning the input dataset into k clusters that induce an approximately optimal solution, up to some amount of adversarial error α ∈ [0,1/2). We give a polynomial-time algorithm that outputs a (1+γα)/(1-α)²-approximation to 𝓁₂² min-sum k-clustering, for a fixed constant γ > 0.

Cite as

Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou. On Approximability of 𝓁₂² Min-Sum Clustering. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 62:1-62:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{karthikc.s._et_al:LIPIcs.SoCG.2025.62,
  author =	{Karthik C. S. and Lee, Euiwoong and Rabani, Yuval and Schwiegelshohn, Chris and Zhou, Samson},
  title =	{{On Approximability of 𝓁₂² Min-Sum Clustering}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{62:1--62:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.62},
  URN =		{urn:nbn:de:0030-drops-232142},
  doi =		{10.4230/LIPIcs.SoCG.2025.62},
  annote =	{Keywords: Clustering, hardness of approximation, polynomial-time approximation schemes, learning-augmented algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.84

An Empirical Evaluation of k-Means Coresets

Authors: Chris Schwiegelshohn and Omar Ali Sheikh-Omar

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as k-means in both theory and practice. Curiously, there exists no work on comparing the quality of available k-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

Cite as

Chris Schwiegelshohn and Omar Ali Sheikh-Omar. An Empirical Evaluation of k-Means Coresets. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 84:1-84:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{schwiegelshohn_et_al:LIPIcs.ESA.2022.84,
  author =	{Schwiegelshohn, Chris and Sheikh-Omar, Omar Ali},
  title =	{{An Empirical Evaluation of k-Means Coresets}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{84:1--84:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.84},
  URN =		{urn:nbn:de:0030-drops-170225},
  doi =		{10.4230/LIPIcs.ESA.2022.84},
  annote =	{Keywords: coresets, k-means coresets, evaluation, benchmark}
}

4 Search Results for "Sheikh-Omar, Omar Ali"

Guessing Efficiently for Constrained Subspace Approximation

Abstract

Cite as

Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case

Abstract

Cite as

On Approximability of 𝓁₂² Min-Sum Clustering

Abstract

Cite as

An Empirical Evaluation of k-Means Coresets

Abstract

Cite as

Thanks for your feedback!

Could not send message