DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2026.82

Dimension Reduction for Clustering: The Curious Case of Discrete Centers

Authors: Shaofeng H.-C. Jiang, Robert Krauthgamer, Shay Sapir, Sandeep Silwal, and Di Yue

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

The Johnson-Lindenstrauss transform is a fundamental method for dimension reduction in Euclidean spaces, that can map any dataset of n points into dimension O(log n) with low distortion of their distances. This dimension bound is tight in general, but one can bypass it for specific problems. Indeed, tremendous progress has been made for clustering problems, especially in the continuous setting where centers can be picked from the ambient space ℝ^d. Most notably, for k-median and k-means, the dimension bound was improved to O(log k) [Makarychev, Makarychev and Razenshteyn, STOC 2019]. We explore dimension reduction for clustering in the discrete setting, where centers can only be picked from the dataset, and present two results that are both parameterized by the doubling dimension of the dataset, denoted as ddim. The first result shows that dimension O_{ε}(ddim + log k + log log n) suffices, and is moreover tight, to guarantee that the cost is preserved within factor 1±ε for every set of centers. Our second result eliminates the log log n term in the dimension through a relaxation of the guarantee (namely, preserving the cost only for all approximately-optimal sets of centers), which maintains its usefulness for downstream applications. Overall, we achieve strong dimension reduction in the discrete setting, and find that it differs from the continuous setting not only in the dimension bound, which depends on the doubling dimension, but also in the guarantees beyond preserving the optimal value, such as which clusterings are preserved.

Cite as

Shaofeng H.-C. Jiang, Robert Krauthgamer, Shay Sapir, Sandeep Silwal, and Di Yue. Dimension Reduction for Clustering: The Curious Case of Discrete Centers. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 82:1-82:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ITCS.2026.82,
  author =	{Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Sapir, Shay and Silwal, Sandeep and Yue, Di},
  title =	{{Dimension Reduction for Clustering: The Curious Case of Discrete Centers}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{82:1--82:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.82},
  URN =		{urn:nbn:de:0030-drops-253698},
  doi =		{10.4230/LIPIcs.ITCS.2026.82},
  annote =	{Keywords: dimension reduction, clustering, k-median, k-means, doubling dimension}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2026.54

Testable Algorithms for Approximately Counting Edges and Triangles in Sublinear Time and Space

Authors: Talya Eden, Ronitt Rubinfeld, and Arsen Vasilyan

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

We consider the fundamental problems of approximately counting the numbers of edges and triangles in a graph in sublinear time. Previous algorithms for these tasks are significantly more efficient under a promise that the arboricity of the graph is bounded by some parameter ̅α. However, when this promise is violated, the estimates given by these algorithms are no longer guaranteed to be correct. For the triangle counting task, we give an algorithm that requires no promise on the input graph G, and computes a (1±ε)-approximation for the number of triangles t in G in time O^*((m⋅ α(G))/t + m/(t^{2/3)}), where α(G) is the arboricity of the graph. The algorithm can be used on any graph G (no prior knowledge of the arboricity α(G) is required), and the algorithm adapts its run-time on the fly based on the graph G. We accomplish this by trying a sequence of candidate values α̃ for α(G) and using a novel algorithm in the framework of testable algorithms. This ensures that wrong candidates α̃ cannot lead to wrong estimates: if the advice is incorrect, the algorithm either succeeds despite this or detects this and continues with a new candidate. Once the algorithm accepts the candidate, its output is guaranteed to be correct with high probability. We prove that this approach preserves - up to an additive overhead - the dramatic efficiency gains obtainable when good arboricity bounds are known in advance, while ensuring robustness against misleading advice. We further complement this result with a lower bound, showing that such an overhead is unavoidable whenever the advice may be faulty. We further demonstrate implications of our results for triangle counting in the streaming model.

Cite as

Talya Eden, Ronitt Rubinfeld, and Arsen Vasilyan. Testable Algorithms for Approximately Counting Edges and Triangles in Sublinear Time and Space. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 54:1-54:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{eden_et_al:LIPIcs.ITCS.2026.54,
  author =	{Eden, Talya and Rubinfeld, Ronitt and Vasilyan, Arsen},
  title =	{{Testable Algorithms for Approximately Counting Edges and Triangles in Sublinear Time and Space}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{54:1--54:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.54},
  URN =		{urn:nbn:de:0030-drops-253417},
  doi =		{10.4230/LIPIcs.ITCS.2026.54},
  annote =	{Keywords: Sublinear Algorithms, Triangle Counting, Edge Counting, Arboricity}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2026.18

Interactive Proofs for Distribution Testing with Conditional Oracles

Authors: Ari Biswas, Mark Bun, Clément L. Canonne, and Satchit Sivakumar

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM 2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for testing and estimating a range of distribution properties, they all suffer from an inherent issue: for most interesting properties of distributions over a domain of size N, the verifier must draw at least Ω(√N) samples of its own. While sublinear in N, this is still prohibitive for large domains encountered in practice. In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) pairwise conditional queries, effectively enabling them to perform "local comparison checks" of the prover’s claims. We systematically investigate the landscape of interactive proofs in this new setting, giving poly-logarithmic query and sample protocols for (tolerantly) testing all label-invariant properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.

Cite as

Ari Biswas, Mark Bun, Clément L. Canonne, and Satchit Sivakumar. Interactive Proofs for Distribution Testing with Conditional Oracles. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{biswas_et_al:LIPIcs.ITCS.2026.18,
  author =	{Biswas, Ari and Bun, Mark and Canonne, Cl\'{e}ment L. and Sivakumar, Satchit},
  title =	{{Interactive Proofs for Distribution Testing with Conditional Oracles}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.18},
  URN =		{urn:nbn:de:0030-drops-253059},
  doi =		{10.4230/LIPIcs.ITCS.2026.18},
  annote =	{Keywords: Distribution Testing, Interactive Proofs}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2026.30

New Bounds for Circular Trace Reconstruction

Authors: Arnav Burudgunte, Paul Valiant, and Hongao Wang

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

The "trace reconstruction" problem asks, given an unknown binary string x and a channel that repeatedly returns "traces" of x with each bit randomly deleted with some probability p, how many traces are needed to recover x? There is an exponential gap between the best known upper and lower bounds for this problem. Many variants of the model have been introduced in hopes of motivating or revealing new approaches to narrow this gap. We study the variant of circular trace reconstruction introduced by Narayanan and Ren (ITCS 2021), in which traces undergo a random cyclic shift in addition to random deletions. We show an improved lower bound of Ω̃(n⁵) for circular trace reconstruction. This contrasts with the (previously) best known lower bounds of Ω̃(n³) in the circular case and Ω̃(n^{3/2}) in the linear case. Our bound shows the indistinguishability of traces from two sparse strings x,y that each have a constant number of nonzeros. Can this technique be extended significantly? How hard is it to reconstruct a sparse string x under a cyclic deletion channel? We resolve these questions by showing, using Fourier techniques, that Õ(n⁶) traces suffice for reconstructing any constant-sparse string in a circular deletion channel, in contrast to the best known upper bound of exp(Õ(n^{1/3})) for general strings in the circular deletion channel. This shows that new algorithms or new lower bounds must focus on non-constant-sparse strings.

Cite as

Arnav Burudgunte, Paul Valiant, and Hongao Wang. New Bounds for Circular Trace Reconstruction. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 30:1-30:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{burudgunte_et_al:LIPIcs.ITCS.2026.30,
  author =	{Burudgunte, Arnav and Valiant, Paul and Wang, Hongao},
  title =	{{New Bounds for Circular Trace Reconstruction}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{30:1--30:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.30},
  URN =		{urn:nbn:de:0030-drops-253176},
  doi =		{10.4230/LIPIcs.ITCS.2026.30},
  annote =	{Keywords: Trace reconstruction, algorithmic statistics, Fourier analysis}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.23

On the Instance Optimality of Detecting Collisions and Subgraphs

Authors: Omri Ben-Eliezer, Tomer Grossman, and Moni Naor

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

Suppose you are given a function f: [n] → [n] via (black-box) query access to the function. You are looking to find something local, like a collision (a pair x ≠ y s.t. f(x) = f(y)). The question is whether knowing the "shape" of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is g(n)-instance optimal if it admits an algorithm A satisfying that for any possible input, the (randomized) query complexity of A is at most g(n) times larger than the query complexity of any algorithm A' which solves the same problem while holding an unlabeled copy of the input (i.e., any A' that "knows the structure of the input"). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions: - A few very simple properties have an O(1)-instance optimal algorithm. - Most properties of graphs and functions, with examples such as containing a fixed point or a 3-collision in functions, or a triangle in graphs, are n^{c}-far from instance optimal for some constant c > 0. - The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are not Ω(log n)-instance optimal, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is O(log(n))-instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is O(√{n/(log n)}).

Cite as

Omri Ben-Eliezer, Tomer Grossman, and Moni Naor. On the Instance Optimality of Detecting Collisions and Subgraphs. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{beneliezer_et_al:LIPIcs.ICALP.2025.23,
  author =	{Ben-Eliezer, Omri and Grossman, Tomer and Naor, Moni},
  title =	{{On the Instance Optimality of Detecting Collisions and Subgraphs}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{23:1--23:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.23},
  URN =		{urn:nbn:de:0030-drops-234002},
  doi =		{10.4230/LIPIcs.ICALP.2025.23},
  annote =	{Keywords: instance optimality, instance complexity, unlabeled certificate, subgraph detection, collision detection}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.3

Near-Optimal Trace Reconstruction for Mildly Separated Strings

Authors: Anders Aamand, Allen Liu, and Shyam Narayanan

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

In the trace reconstruction problem our goal is to learn an unknown string x ∈ {0,1}ⁿ given independent traces of x. A trace is obtained by independently deleting each bit of x with some probability δ and concatenating the remaining bits. It is a major open question whether the trace reconstruction problem can be solved with a polynomial number of traces when the deletion probability δ is constant. The best known upper bound and lower bounds are respectively exp(Õ(n^{1/5})) [Zachary Chase, 2021a] and ̃ Ω(n^{3/2}) [Zachary Chase, 2021b]. Our main result is that if the string x is mildly separated, meaning that the number of zeros between any two ones in x is at least polylog n, and if δ is a sufficiently small constant, then the trace reconstruction problem can be solved with O(n log n) traces and in polynomial time.

Cite as

Anders Aamand, Allen Liu, and Shyam Narayanan. Near-Optimal Trace Reconstruction for Mildly Separated Strings. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{aamand_et_al:LIPIcs.ICALP.2025.3,
  author =	{Aamand, Anders and Liu, Allen and Narayanan, Shyam},
  title =	{{Near-Optimal Trace Reconstruction for Mildly Separated Strings}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.3},
  URN =		{urn:nbn:de:0030-drops-233801},
  doi =		{10.4230/LIPIcs.ICALP.2025.3},
  annote =	{Keywords: Trace Reconstruction, deletion channel, sample complexity}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.64

Fully Scalable MPC Algorithms for Euclidean k-Center

Authors: Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.-C. Jiang

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

The k-center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The k-center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of k-center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has Ω(k) local memory per machine. While this setting covers the case of small values of k, for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large k has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the k-center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only k(1+o(1)) centers whose cost is a super-constant approximation of k-center. In this work, we significantly improve upon the earlier results for the k-center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in ℝ^d, we present the first constant-round fully scalable MPC algorithm for (2+ε)-approximation. We push the ratio further to (1 + ε)-approximation albeit using slightly more (1 + ε)k centers. All these results naturally extends to slightly super-constant values of d. In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an O(log n/ log log n)-approximation for k-center.

Cite as

Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.-C. Jiang. Fully Scalable MPC Algorithms for Euclidean k-Center. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 64:1-64:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2025.64,
  author =	{Czumaj, Artur and Gao, Guichen and Ghaffari, Mohsen and Jiang, Shaofeng H.-C.},
  title =	{{Fully Scalable MPC Algorithms for Euclidean k-Center}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{64:1--64:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.64},
  URN =		{urn:nbn:de:0030-drops-234416},
  doi =		{10.4230/LIPIcs.ICALP.2025.64},
  annote =	{Keywords: Massively Parallel Computing, Euclidean Spaces, k-Center Clustering}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.62

On Approximability of 𝓁₂² Min-Sum Clustering

Authors: Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

The 𝓁₂² min-sum k-clustering problem is to partition an input set into clusters C_1,…,C_k to minimize ∑_{i=1}^k ∑_{p,q ∈ C_i} ‖p-q‖₂². Although 𝓁₂² min-sum k-clustering is NP-hard, it is not known whether it is NP-hard to approximate 𝓁₂² min-sum k-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the 𝓁₂² min-sum k-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than 1.056 and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a fast PTAS for 𝓁₂² min-sum k-clustering. Specifically, our algorithm runs in time O(n^{1+o(1)}d⋅ 2^{(k/ε)^O(1)}), which is the first nearly linear time algorithm for this problem. We also consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label i ∈ [k] for input point, thereby implicitly partitioning the input dataset into k clusters that induce an approximately optimal solution, up to some amount of adversarial error α ∈ [0,1/2). We give a polynomial-time algorithm that outputs a (1+γα)/(1-α)²-approximation to 𝓁₂² min-sum k-clustering, for a fixed constant γ > 0.

Cite as

Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou. On Approximability of 𝓁₂² Min-Sum Clustering. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 62:1-62:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{karthikc.s._et_al:LIPIcs.SoCG.2025.62,
  author =	{Karthik C. S. and Lee, Euiwoong and Rabani, Yuval and Schwiegelshohn, Chris and Zhou, Samson},
  title =	{{On Approximability of 𝓁₂² Min-Sum Clustering}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{62:1--62:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.62},
  URN =		{urn:nbn:de:0030-drops-232142},
  doi =		{10.4230/LIPIcs.SoCG.2025.62},
  annote =	{Keywords: Clustering, hardness of approximation, polynomial-time approximation schemes, learning-augmented algorithms}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.6

Near-Universally-Optimal Differentially Private Minimum Spanning Trees

Authors: Richard Hladík and Jakub Tětek

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Devising mechanisms with good beyond-worst-case input-dependent performance has been an important focus of differential privacy, with techniques such as smooth sensitivity, propose-test-release, or inverse sensitivity mechanism being developed to achieve this goal. This makes it very natural to use the notion of universal optimality in differential privacy. Universal optimality is a strong instance-specific optimality guarantee for problems on weighted graphs, which roughly states that for any fixed underlying (unweighted) graph, the algorithm is optimal in the worst-case sense, with respect to the possible setting of the edge weights. In this paper, we give the first such result in differential privacy. Namely, we prove that a simple differentially private mechanism for approximately releasing the minimum spanning tree is near-optimal in the sense of universal optimality for the 𝓁₁ neighbor relation. Previously, it was only known that this mechanism is nearly optimal in the worst case. We then focus on the 𝓁_∞ neighbor relation, for which the described mechanism is not optimal. We show that one may implement the exponential mechanism for MST in polynomial time, and that this results in universal near-optimality for both the 𝓁₁ and the 𝓁_∞ neighbor relations.

Cite as

Richard Hladík and Jakub Tětek. Near-Universally-Optimal Differentially Private Minimum Spanning Trees. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 6:1-6:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hladik_et_al:LIPIcs.FORC.2025.6,
  author =	{Hlad{\'\i}k, Richard and T\v{e}tek, Jakub},
  title =	{{Near-Universally-Optimal Differentially Private Minimum Spanning Trees}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{6:1--6:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.6},
  URN =		{urn:nbn:de:0030-drops-231337},
  doi =		{10.4230/LIPIcs.FORC.2025.6},
  annote =	{Keywords: differential privacy, universal optimality, minimum spanning trees}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.9

Optimal Rates for Robust Stochastic Convex Optimization

Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, and Stephen J. Wright

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the ε-contamination model, where an adversary can inspect and replace up to an ε-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the ε-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these stringent assumptions, assuming only population-level smoothness of the loss. Moreover, our algorithms can be adapted to handle the case in which the covariance parameter is unknown, and can be extended to nonsmooth population risks via convolutional smoothing. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.

Cite as

Changyu Gao, Andrew Lowy, Xingyu Zhou, and Stephen J. Wright. Optimal Rates for Robust Stochastic Convex Optimization. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 9:1-9:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gao_et_al:LIPIcs.FORC.2025.9,
  author =	{Gao, Changyu and Lowy, Andrew and Zhou, Xingyu and Wright, Stephen J.},
  title =	{{Optimal Rates for Robust Stochastic Convex Optimization}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{9:1--9:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.9},
  URN =		{urn:nbn:de:0030-drops-231369},
  doi =		{10.4230/LIPIcs.FORC.2025.9},
  annote =	{Keywords: Adversarial Robustness, Machine Learning, Optimization Algorithms, Robust Optimization, Stochastic Convex Optimization}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.15

Differentially Private High-Dimensional Approximate Range Counting, Revisited

Authors: Martin Aumüller, Fabrizio Boninsegna, and Francesco Silvestri

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Locality Sensitive Filters are known for offering a quasi-linear space data structure with rigorous guarantees for the Approximate Near Neighbor search (ANN) problem. Building on Locality Sensitive Filters, we derive a simple data structure for the Approximate Near Neighbor Counting (ANNC) problem under differential privacy (DP). Moreover, we provide a simple analysis leveraging a connection with concomitant statistics and extreme value theory. Our approach produces a simple data structure with a tunable parameter that regulates a trade-off between space-time and utility. Through this trade-off, our data structure achieves the same performance as the recent findings of Andoni et al. (NeurIPS 2023) while offering better utility at the cost of higher space and query time. In addition, we provide a more efficient algorithm under pure ε-DP and elucidate the connection between ANN and differentially private ANNC. As a side result, the paper provides a more compact description and analysis of Locality Sensitive Filters for Fair Near Neighbor Search, improving a previous result in Aumüller et al. (TODS 2022).

Cite as

Martin Aumüller, Fabrizio Boninsegna, and Francesco Silvestri. Differentially Private High-Dimensional Approximate Range Counting, Revisited. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 15:1-15:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{aumuller_et_al:LIPIcs.FORC.2025.15,
  author =	{Aum\"{u}ller, Martin and Boninsegna, Fabrizio and Silvestri, Francesco},
  title =	{{Differentially Private High-Dimensional Approximate Range Counting, Revisited}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{15:1--15:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.15},
  URN =		{urn:nbn:de:0030-drops-231426},
  doi =		{10.4230/LIPIcs.FORC.2025.15},
  annote =	{Keywords: Differential Privacy, Locality Sensitive Filters, Approximate Range Counting, Concominant Statistics}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.13

Low Sensitivity Hopsets

Authors: Vikrant Ashvinkumar, Aaron Bernstein, Chengyuan Deng, Jie Gao, and Nicole Wein

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

Given a weighted graph G = (V,E,w), a (β, ε)-hopset H is an edge set such that for any s,t ∈ V, where s can reach t in G, there is a path from s to t in G ∪ H which uses at most β hops whose length is in the range [dist_G(s,t), (1+ε)dist_G(s,t)]. We break away from the traditional question that asks for a hopset H that achieves small |H| and small diameter β and instead study the sensitivity of H, a new quality measure. The sensitivity of a vertex (or edge) given a hopset H is, informally, the number of times a single hop in G ∪ H bypasses it; a bit more formally, assuming shortest paths in G are unique, it is the number of hopset edges (s,t) ∈ H such that the vertex (or edge) is contained in the unique st-path in G having length exactly dist_G(s,t). The sensitivity associated with H is then the maximum sensitivity over all vertices (or edges). The highlights of our results are: - A construction for (Õ(√n), 0)-hopsets on undirected graphs with O(log n) sensitivity, complemented with a lower bound showing that Õ(√n) is tight up to polylogarithmic factors for any construction with polylogarithmic sensitivity. - A construction for (n^o(1), ε)-hopsets on undirected graphs with n^o(1) sensitivity for any ε > 0 that is at least inverse polylogarithmic, complemented with a lower bound on the tradeoff between β, ε, and the sensitivity. - We define a notion of sensitivity for β-shortcut sets (which are the reachability analogues of hopsets) and give a construction for Õ(√n)-shortcut sets on directed graphs with O(log n) sensitivity, complemented with a lower bound showing that β = Ω̃(n^{1/3}) for any construction with polylogarithmic sensitivity. We believe hopset sensitivity is a natural measure in and of itself, and could potentially find use in a diverse range of contexts. More concretely, the notion of hopset sensitivity is also directly motivated by the Differentially Private All Sets Range Queries problem [Deng et al. WADS 23]. Our result for O(log n) sensitivity (Õ(√n), 0)-hopsets on undirected graphs immediately improves the current best-known upper bound on utility from Õ(n^{1/3}) to Õ(n^{1/4}) in the pure-DP setting, which is tight up to polylogarithmic factors.

Cite as

Vikrant Ashvinkumar, Aaron Bernstein, Chengyuan Deng, Jie Gao, and Nicole Wein. Low Sensitivity Hopsets. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 13:1-13:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ashvinkumar_et_al:LIPIcs.ITCS.2025.13,
  author =	{Ashvinkumar, Vikrant and Bernstein, Aaron and Deng, Chengyuan and Gao, Jie and Wein, Nicole},
  title =	{{Low Sensitivity Hopsets}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{13:1--13:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.13},
  URN =		{urn:nbn:de:0030-drops-226418},
  doi =		{10.4230/LIPIcs.ITCS.2025.13},
  annote =	{Keywords: Hopsets, Shortcuts, Sensitivity, Differential Privacy}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.44

Learning-Augmented Streaming Algorithms for Approximating MAX-CUT

Authors: Yinhao Dong, Pan Peng, and Ali Vakilian

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

We study learning-augmented streaming algorithms for estimating the value of MAX-CUT in a graph. In the classical streaming model, while a 1/2-approximation for estimating the value of MAX-CUT can be trivially achieved with O(1) words of space, Kapralov and Krachun [STOC’19] showed that this is essentially the best possible: for any ε > 0, any (randomized) single-pass streaming algorithm that achieves an approximation ratio of at least 1/2 + ε requires Ω(n / 2^poly(1/ε)) space. We show that it is possible to surpass the 1/2-approximation barrier using just O(1) words of space by leveraging a (machine learned) oracle. Specifically, we consider streaming algorithms that are equipped with an ε-accurate oracle that for each vertex in the graph, returns its correct label in {-1, +1}, corresponding to an optimal MAX-CUT solution in the graph, with some probability 1/2 + ε, and the incorrect label otherwise. Within this framework, we present a single-pass algorithm that approximates the value of MAX-CUT to within a factor of 1/2 + Ω(ε²) with probability at least 2/3 for insertion-only streams, using only poly(1/ε) words of space. We also extend our algorithm to fully dynamic streams while maintaining a space complexity of poly(1/ε,log n) words.

Cite as

Yinhao Dong, Pan Peng, and Ali Vakilian. Learning-Augmented Streaming Algorithms for Approximating MAX-CUT. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 44:1-44:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{dong_et_al:LIPIcs.ITCS.2025.44,
  author =	{Dong, Yinhao and Peng, Pan and Vakilian, Ali},
  title =	{{Learning-Augmented Streaming Algorithms for Approximating MAX-CUT}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{44:1--44:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.44},
  URN =		{urn:nbn:de:0030-drops-226728},
  doi =		{10.4230/LIPIcs.ITCS.2025.44},
  annote =	{Keywords: Learning-Augmented Algorithms, Graph Streaming Algorithms, MAX-CUT}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.70

Facility Location on High-Dimensional Euclidean Spaces

Authors: Euiwoong Lee and Kijun Shin

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including k-Means and k-Median. While they admit strictly better approximation ratios than their general metric versions, their approximation ratios are still higher than the hardness ratios for general metrics, leaving the possibility that the ultimate optimal approximation ratios will be the same between Euclidean and general metrics. Moreover, such an improved algorithm for Euclidean spaces is not known for Uncapaciated Facility Location (UFL), another fundamental problem in the area. In this paper, we prove that for any γ ≥ 1.6774 there exists ε > 0 such that Euclidean UFL admits a (γ, 1 + 2e^{-γ} - ε)-bifactor approximation algorithm, improving the result of Byrka and Aardal [Byrka and Aardal, 2010]. Together with the (γ, 1 + 2e^{-γ}) NP-hardness in general metrics, it shows the first separation between general and Euclidean metrics for the aforementioned basic problems. We also present an (α_Li - ε)-(unifactor) approximation algorithm for UFL for some ε > 0 in Euclidean spaces, where α_Li ≈ 1.488 is the best-known approximation ratio for UFL by Li [Li, 2013].

Cite as

Euiwoong Lee and Kijun Shin. Facility Location on High-Dimensional Euclidean Spaces. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 70:1-70:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ITCS.2025.70,
  author =	{Lee, Euiwoong and Shin, Kijun},
  title =	{{Facility Location on High-Dimensional Euclidean Spaces}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{70:1--70:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.70},
  URN =		{urn:nbn:de:0030-drops-226982},
  doi =		{10.4230/LIPIcs.ITCS.2025.70},
  annote =	{Keywords: Approximation Algorithms, Clustering, Facility Location}
}

Document

Position

DOI: 10.4230/TGDK.1.1.2

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Large Language Models (LLMs) have taken Knowledge Representation - and the world - by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.

Cite as

Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux. Large Language Models and Knowledge Graphs: Opportunities and Challenges. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 2:1-2:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{pan_et_al:TGDK.1.1.2,
  author =	{Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
  title =	{{Large Language Models and Knowledge Graphs: Opportunities and Challenges}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:38},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.2},
  URN =		{urn:nbn:de:0030-drops-194766},
  doi =		{10.4230/TGDK.1.1.2},
  annote =	{Keywords: Large Language Models, Pre-trained Language Models, Knowledge Graphs, Ontology, Retrieval Augmented Language Models}
}

22 Search Results for "Narayanan, Shyam"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message