DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2026.82

Dimension Reduction for Clustering: The Curious Case of Discrete Centers

Authors: Shaofeng H.-C. Jiang, Robert Krauthgamer, Shay Sapir, Sandeep Silwal, and Di Yue

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

The Johnson-Lindenstrauss transform is a fundamental method for dimension reduction in Euclidean spaces, that can map any dataset of n points into dimension O(log n) with low distortion of their distances. This dimension bound is tight in general, but one can bypass it for specific problems. Indeed, tremendous progress has been made for clustering problems, especially in the continuous setting where centers can be picked from the ambient space ℝ^d. Most notably, for k-median and k-means, the dimension bound was improved to O(log k) [Makarychev, Makarychev and Razenshteyn, STOC 2019]. We explore dimension reduction for clustering in the discrete setting, where centers can only be picked from the dataset, and present two results that are both parameterized by the doubling dimension of the dataset, denoted as ddim. The first result shows that dimension O_{ε}(ddim + log k + log log n) suffices, and is moreover tight, to guarantee that the cost is preserved within factor 1±ε for every set of centers. Our second result eliminates the log log n term in the dimension through a relaxation of the guarantee (namely, preserving the cost only for all approximately-optimal sets of centers), which maintains its usefulness for downstream applications. Overall, we achieve strong dimension reduction in the discrete setting, and find that it differs from the continuous setting not only in the dimension bound, which depends on the doubling dimension, but also in the guarantees beyond preserving the optimal value, such as which clusterings are preserved.

Cite as

Shaofeng H.-C. Jiang, Robert Krauthgamer, Shay Sapir, Sandeep Silwal, and Di Yue. Dimension Reduction for Clustering: The Curious Case of Discrete Centers. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 82:1-82:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ITCS.2026.82,
  author =	{Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Sapir, Shay and Silwal, Sandeep and Yue, Di},
  title =	{{Dimension Reduction for Clustering: The Curious Case of Discrete Centers}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{82:1--82:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.82},
  URN =		{urn:nbn:de:0030-drops-253698},
  doi =		{10.4230/LIPIcs.ITCS.2026.82},
  annote =	{Keywords: dimension reduction, clustering, k-median, k-means, doubling dimension}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2024.64

Moderate Dimension Reduction for k-Center Clustering

Authors: Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir

Published in: LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

Abstract

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of n points in ℝ^d and any fixed ε > 0, it reduces the dimension d to O(log n) while preserving, with high probability, all the pairwise Euclidean distances within factor 1+ε. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or k-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below O(log n) does not preserve the optimal value within factor 1+ε. We propose to focus on another regime, of moderate dimension reduction, where a problem’s value is preserved within factor α > 1 using target dimension (log n)/poly(α). We establish the viability of this approach and show that the famous k-center problem is α-approximated when reducing to dimension O({log n}/α² + log k). Along the way, we address the diameter problem via the special case k = 1. Our result extends to several important variants of k-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input’s doubling dimension. While our poly(α)-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for k-center in dynamic geometric streams, that achieves O(α)-approximation using space poly(kdn^{1/α²}). This is the first algorithm to beat O(n) space in high dimension d, as all previous algorithms require space at least exp(d). Furthermore, it extends to the k-center variants mentioned above.

Cite as

Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir. Moderate Dimension Reduction for k-Center Clustering. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 64:1-64:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.SoCG.2024.64,
  author =	{Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Sapir, Shay},
  title =	{{Moderate Dimension Reduction for k-Center Clustering}},
  booktitle =	{40th International Symposium on Computational Geometry (SoCG 2024)},
  pages =	{64:1--64:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-316-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{293},
  editor =	{Mulzer, Wolfgang and Phillips, Jeff M.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.64},
  URN =		{urn:nbn:de:0030-drops-200095},
  doi =		{10.4230/LIPIcs.SoCG.2024.64},
  annote =	{Keywords: Johnson-Lindenstrauss transform, dimension reduction, clustering, streaming algorithms}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2024.88

Color Fault-Tolerant Spanners

Authors: Asaf Petruschka, Shay Sapir, and Elad Tzalik

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We initiate the study of spanners in arbitrarily vertex- or edge-colored graphs (with no "legality" restrictions), that are resilient to failures of entire color classes. When a color fails, all vertices/edges of that color crash. An f-color fault-tolerant (f-CFT) t-spanner of an n-vertex colored graph G is a subgraph H that preserves distances up to factor t, even in the presence of at most f color faults. This notion generalizes the well-studied f-vertex/edge fault-tolerant (f-V/EFT) spanners. The size (number of edges) of an f-V/EFT spanner crucially depends on the number f of vertex/edge faults to be tolerated. In the colored variants, even a single color fault can correspond to an unbounded number of vertex/edge faults. The key conceptual contribution of this work is in showing that the size required by an f-CFT spanner is in fact comparable to its uncolored counterpart, with no dependency on the size of color classes. We provide optimal bounds on the size required by f-CFT (2k-1)-spanners, as follows: - When vertices have colors, we show an upper bound of O(f^{1-1/k} n^{1+1/k}) edges. This precisely matches the (tight) bounds for (2k-1)-spanners resilient to f individual vertex faults [Bodwin et al., SODA 2018; Bodwin and Patel, PODC 2019]. - For colored edges, we show that O(f n^{1+1/k}) edges are always sufficient. Further, we prove this is tight, i.e., we provide an Ω(f n^{1+1/k}) (worst-case) lower bound. The state-of-the-art bounds known for the corresponding uncolored setting of edge faults are (roughly) Θ(f^{1/2} n^{1+1/k}) [Bodwin et al., SODA 2018; Bodwin, Dinitz and Robelle, SODA 2022]. - We also consider a mixed model where both vertices and edges are colored. In this case, we show tight Θ(f^{2-1/k} n^{1+1/k}) bounds. Thus, CFT spanners exhibit an interesting phenomenon: while (individual) edge faults are "easier" than vertex faults, edge-color faults are "harder" than vertex-color faults. Our upper bounds are based on a generalization of the blocking set technique of [Bodwin and Patel, PODC 2019] for analyzing the (exponential-time) greedy algorithm for FT spanners. We complement them by providing efficient constructions of CFT spanners with similar size guarantees, based on the algorithm of [Dinitz and Robelle, PODC 2020].

Cite as

Asaf Petruschka, Shay Sapir, and Elad Tzalik. Color Fault-Tolerant Spanners. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 88:1-88:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{petruschka_et_al:LIPIcs.ITCS.2024.88,
  author =	{Petruschka, Asaf and Sapir, Shay and Tzalik, Elad},
  title =	{{Color Fault-Tolerant Spanners}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{88:1--88:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.88},
  URN =		{urn:nbn:de:0030-drops-196160},
  doi =		{10.4230/LIPIcs.ITCS.2024.88},
  annote =	{Keywords: Fault tolerance, Graph spanners}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2023.30

Lower Bounds for Pseudo-Deterministic Counting in a Stream

Authors: Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, and Shay Sapir

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

Abstract

Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary - for many streaming problems, both relaxations must be employed simultaneously, to avoid an exponentially larger (and often trivial) space complexity. A common drawback of these randomized approximate algorithms is that independent executions on the same input have different outputs, that depend on their random coins. Pseudo-deterministic algorithms combat this issue, and for every input, they output with high probability the same "canonical" solution. We consider perhaps the most basic problem in data streams, of counting the number of items in a stream of length at most n. Morris’s counter [CACM, 1978] is a randomized approximation algorithm for this problem that uses O(log log n) bits of space, for every fixed approximation factor (greater than 1). Goldwasser, Grossman, Mohanty and Woodruff [ITCS 2020] asked whether pseudo-deterministic approximation algorithms can match this space complexity. Our main result answers their question negatively, and shows that such algorithms must use Ω(√{log n / log log n}) bits of space. Our approach is based on a problem that we call Shift Finding, and may be of independent interest. In this problem, one has query access to a shifted version of a known string F ∈ {0,1}^{3n}, which is guaranteed to start with n zeros and end with n ones, and the goal is to find the unknown shift using a small number of queries. We provide for this problem an algorithm that uses O(√n) queries. It remains open whether poly(log n) queries suffice; if true, then our techniques immediately imply a nearly-tight Ω(log n/log log n) space bound for pseudo-deterministic approximate counting.

Cite as

Vladimir Braverman, Robert Krauthgamer, Aditya Krishnan, and Shay Sapir. Lower Bounds for Pseudo-Deterministic Counting in a Stream. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 30:1-30:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.ICALP.2023.30,
  author =	{Braverman, Vladimir and Krauthgamer, Robert and Krishnan, Aditya and Sapir, Shay},
  title =	{{Lower Bounds for Pseudo-Deterministic Counting in a Stream}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{30:1--30:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.30},
  URN =		{urn:nbn:de:0030-drops-180827},
  doi =		{10.4230/LIPIcs.ICALP.2023.30},
  annote =	{Keywords: streaming algorithms, pseudo-deterministic, approximate counting}
}

Search Results

Documents authored by Sapir, Shay

Dimension Reduction for Clustering: The Curious Case of Discrete Centers

Abstract

Cite as

Moderate Dimension Reduction for k-Center Clustering

Abstract

Cite as

Color Fault-Tolerant Spanners

Abstract

Cite as

Lower Bounds for Pseudo-Deterministic Counting in a Stream

Abstract

Cite as

Thanks for your feedback!

Could not send message