DROPS

Document

Research

On the Computational Cost of Knowledge Graph Embeddings

Authors: Victor Charpenay, Mansour Zoubeirou A Mayaki, and Antoine Zimmermann

Published in: TGDK, Volume 4, Issue 1 (2026). Transactions on Graph Data and Knowledge, Volume 4, Issue 1

Abstract

Over a decade, numerous Knowledge Graph Embedding (KGE) models have been designed and evaluated on reference datasets, always with increasing performance. In this paper, we re-evaluate these models with respect to their computational efficiency during training, by estimating the computational cost of the procedure expressed in floating-point operations. We design a cost model based on analytical expressions and apply it on a collection of 20 KGE models, representative of the state-of-the-art. We show that dimensionality or parameter efficiency, used in the literature to compare models with each other, are not suitable to evaluate the true cost of models. Through fixed-budget experiments, a novel approach to evaluate KGE models based on cost estimates, we re-assess the relative performance of model families compared to the state-of-the-art. Bilinear models such as ComplEx underperform with a low computational budget while hyperbolic linear models appear to offer no particular benefit compared to simpler Euclidian models, especially the MuRE model. Neural models, such as ConvE or CompGCN, achieve reasonable performance in the literature but their high computational cost appears unnecessary when compared with other models. The trade-off between efficiency and expressivity of both linear and neural models is to be further explored.

Cite as

Victor Charpenay, Mansour Zoubeirou A Mayaki, and Antoine Zimmermann. On the Computational Cost of Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 4, Issue 1, pp. 1:1-1:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@Article{charpenay_et_al:TGDK.4.1.1,
  author =	{Charpenay, Victor and Zoubeirou A Mayaki, Mansour and Zimmermann, Antoine},
  title =	{{On the Computational Cost of Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:30},
  ISSN =	{2942-7517},
  year =	{2026},
  volume =	{4},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.4.1.1},
  URN =		{urn:nbn:de:0030-drops-256863},
  doi =		{10.4230/TGDK.4.1.1},
  annote =	{Keywords: Knowledge Graph Embedding, Parameter Efficiency, Computational Budget, Green AI}
}

Document

Survey

DOI: 10.4230/TGDK.3.2.1

Resilience in Knowledge Graph Embeddings

Authors: Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo

Published in: TGDK, Volume 3, Issue 2 (2025). Transactions on Graph Data and Knowledge, Volume 3, Issue 2

Abstract

In recent years, knowledge graphs have gained interest and witnessed widespread applications in various domains, such as information retrieval, question-answering, recommendation systems, amongst others. Large-scale knowledge graphs to this end have demonstrated their utility in effectively representing structured knowledge. To further facilitate the application of machine learning techniques, knowledge graph embedding models have been developed. Such models can transform entities and relationships within knowledge graphs into vectors. However, these embedding models often face challenges related to noise, missing information, distribution shift, adversarial attacks, etc. This can lead to sub-optimal embeddings and incorrect inferences, thereby negatively impacting downstream applications. While the existing literature has focused so far on adversarial attacks on KGE models, the challenges related to the other critical aspects remain unexplored. In this paper, we, first of all, give a unified definition of resilience, encompassing several factors such as generalisation, in-distribution generalization, distribution adaption, and robustness. After formalizing these concepts for machine learning in general, we define them in the context of knowledge graphs. To find the gap in the existing works on resilience in the context of knowledge graphs, we perform a systematic survey, taking into account all these aspects mentioned previously. Our survey results show that most of the existing works focus on a specific aspect of resilience, namely robustness. After categorizing such works based on their respective aspects of resilience, we discuss the challenges and future research directions.

Cite as

Arnab Sharma, N'Dah Jean Kouagou, and Axel-Cyrille Ngonga Ngomo. Resilience in Knowledge Graph Embeddings. In Transactions on Graph Data and Knowledge (TGDK), Volume 3, Issue 2, pp. 1:1-1:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@Article{sharma_et_al:TGDK.3.2.1,
  author =	{Sharma, Arnab and Kouagou, N'Dah Jean and Ngomo, Axel-Cyrille Ngonga},
  title =	{{Resilience in Knowledge Graph Embeddings}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{1:1--1:38},
  ISSN =	{2942-7517},
  year =	{2025},
  volume =	{3},
  number =	{2},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.1},
  URN =		{urn:nbn:de:0030-drops-248117},
  doi =		{10.4230/TGDK.3.2.1},
  annote =	{Keywords: Knowledge graphs, Resilience, Robustness}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.42

(Multivariate) k-SUM as Barrier to Succinct Computation

Authors: Geri Gokaj, Marvin Künnemann, Sabine Storandt, and Carina Truschel

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

How does the time complexity of a problem change when the input is given succinctly rather than explicitly? We study this question for several geometric problems defined on a set X of N points in ℤ^d. As succinct representation, we choose a sumset (or Minkowski sum) representation: Instead of receiving X explicitly, we are given sets A,B of n points that define X as A+B = {a+b∣ a ∈ A,b ∈ B}. We investigate the fine-grained complexity of this succinct version for several Õ(N)-time computable geometric primitives. Remarkably, we can tie their complexity tightly to the complexity of corresponding k-SUM problems. Specifically, we introduce as All-ints 3-SUM(n,n,k) the following multivariate, multi-output variant of 3-SUM: given sets A,B of size n and set C of size k, determine for all c ∈ C whether there are a ∈ A and b ∈ B with a+b = c. We obtain the following results: 1) Succinct closest L_∞-pair requires time N^{1-o(1)} under the 3-SUM hypothesis, while succinct furthest L_∞-pair can be solved in time Õ(n). 2) Succinct bichromatic closest L_∞-Pair requires time N^{1-o(1)} iff the 4-SUM hypothesis holds. 3) The following problems are fine-grained equivalent to All-ints 3-SUM(n,n,k): succinct skyline computation in 2D with output size k and succinct batched orthogonal range search with k given ranges. This establishes conditionally tight Õ(min{nk, N})-time algorithms for these problems. We obtain further connections with All-ints 3-SUM(n,n,k) for succinctly computing independent sets in unit interval graphs. Thus, (Multivariate) k-SUM problems precisely capture the barrier for enabling sumset-succinct computation for various geometric primitives.

Cite as

Geri Gokaj, Marvin Künnemann, Sabine Storandt, and Carina Truschel. (Multivariate) k-SUM as Barrier to Succinct Computation. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 42:1-42:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gokaj_et_al:LIPIcs.ESA.2025.42,
  author =	{Gokaj, Geri and K\"{u}nnemann, Marvin and Storandt, Sabine and Truschel, Carina},
  title =	{{(Multivariate) k-SUM as Barrier to Succinct Computation}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{42:1--42:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.42},
  URN =		{urn:nbn:de:0030-drops-245101},
  doi =		{10.4230/LIPIcs.ESA.2025.42},
  annote =	{Keywords: Fine-grained complexity theory, sumsets, additive combinatorics, succinct inputs, computational geometry}
}

Document

DOI: 10.4230/LIPIcs.SAT.2025.32

Efficient Certified Reasoning for Binarized Neural Networks

Authors: Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel

Published in: LIPIcs, Volume 341, 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)

Abstract

Neural networks have emerged as essential components in safety-critical applications - these use cases demand complex, yet trustworthy computations. Binarized Neural Networks (BNNs) are a type of neural network where each neuron is constrained to a Boolean value; they are particularly well-suited for safety-critical tasks because they retain much of the computational capacities of full-scale (floating-point or quantized) deep neural networks, but remain compatible with satisfiability solvers for qualitative verification and with model counters for quantitative reasoning. However, existing methods for BNN analysis suffer from either limited scalability or susceptibility to soundness errors, which hinders their applicability in real-world scenarios. In this work, we present a scalable and trustworthy approach for both qualitative and quantitative verification of BNNs. Our approach introduces a native representation of BNN constraints in a custom-designed solver for qualitative reasoning, and in an approximate model counter for quantitative reasoning. We further develop specialized proof generation and checking pipelines with native support for BNN constraint reasoning, ensuring trustworthiness for all of our verification results. Empirical evaluations on a BNN robustness verification benchmark suite demonstrate that our certified solving approach achieves a 9× speedup over prior certified CNF and PB-based approaches, and our certified counting approach achieves a 218× speedup over the existing CNF-based baseline. In terms of coverage, our pipeline produces fully certified results for 99% and 86% of the qualitative and quantitative reasoning queries on BNNs, respectively. This is in sharp contrast to the best existing baselines which can fully certify only 62% and 4% of the queries, respectively.

Cite as

Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, and Kuldeep S. Meel. Efficient Certified Reasoning for Binarized Neural Networks. In 28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 341, pp. 32:1-32:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{yang_et_al:LIPIcs.SAT.2025.32,
  author =	{Yang, Jiong and Tan, Yong Kiam and Soos, Mate and Myreen, Magnus O. and Meel, Kuldeep S.},
  title =	{{Efficient Certified Reasoning for Binarized Neural Networks}},
  booktitle =	{28th International Conference on Theory and Applications of Satisfiability Testing (SAT 2025)},
  pages =	{32:1--32:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-381-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{341},
  editor =	{Berg, Jeremias and Nordstr\"{o}m, Jakob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2025.32},
  URN =		{urn:nbn:de:0030-drops-237665},
  doi =		{10.4230/LIPIcs.SAT.2025.32},
  annote =	{Keywords: Neural network verification, proof certification, SAT solving, approximate model counting}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.11

CluStRE: Streaming Graph Clustering with Multi-Stage Refinement

Authors: Adil Chhabra, Shai Dorian Peretz, and Christian Schulz

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs in a streaming setting, significantly reducing memory overhead while leveraging re-streaming and evolutionary heuristics to improve solution quality. Our method dynamically constructs a quotient graph, enabling modularity-based optimization while efficiently handling large-scale graphs. We introduce multiple configurations of CluStRE to provide trade-offs between speed, memory consumption, and clustering quality. Experimental evaluations demonstrate that CluStRE improves solution quality by 89.8%, operates 2.6× faster, and uses less than two-thirds of the memory required by the state-of-the-art streaming clustering algorithm on average. Moreover, our strongest mode enhances solution quality by up to 150% on average. With this, CluStRE achieves comparable solution quality to in-memory algorithms, i.e. over 96% of the quality of clustering approaches, including Louvain, effectively bridging the gap between streaming and traditional clustering methods.

Cite as

Adil Chhabra, Shai Dorian Peretz, and Christian Schulz. CluStRE: Streaming Graph Clustering with Multi-Stage Refinement. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chhabra_et_al:LIPIcs.SEA.2025.11,
  author =	{Chhabra, Adil and Dorian Peretz, Shai and Schulz, Christian},
  title =	{{CluStRE: Streaming Graph Clustering with Multi-Stage Refinement}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.11},
  URN =		{urn:nbn:de:0030-drops-232493},
  doi =		{10.4230/LIPIcs.SEA.2025.11},
  annote =	{Keywords: graph clustering, community, streaming, online, memetic, evolutionary}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.19

Dynamic Algorithms for Submodular Matching

Authors: Kiarash Banihashem, Leyla Biabani, Samira Goudarzi, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, and Morteza Monemizadeh

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

The Maximum Submodular Matching (MSM) problem is a generalization of the classical Maximum Weight Matching (MWM) problem. In this problem, given a monotone submodular function f: 2^E → ℝ^{≥ 0} defined over subsets of edges of a graph G(V, E), we are asked to return a matching whose submodular value is maximum among all matchings in graph G(V, E). In this paper, we consider this problem in a fully dynamic setting against an oblivious adversary. In this setting, we are given a sequence 𝒮 of insertions and deletions of edges of the underlying graph G(V, E), along with an oracle access to the monotone submodular function f. The goal is to maintain a matching M such that, at any time t of sequence 𝒮, its submodular value is a good approximation of the value of the optimal submodular matching while keeping the number of operations minimal. We develop the first dynamic algorithm for the submodular matching problem, in which we maintain a matching whose submodular value is within expected (8 + ε)-approximation of the optimal submodular matching at any time t of sequence 𝒮 using expected amortized poly(log n, 1/(ε)) update time. Our approach incorporates a range of novel techniques, notably the concept of Uniform Hierarchical Caches (UHC) data structure along with its invariants, which lead to the first algorithm for fully dynamic submodular matching and may be of independent interest for designing dynamic algorithms for other problems.

Cite as

Kiarash Banihashem, Leyla Biabani, Samira Goudarzi, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, and Morteza Monemizadeh. Dynamic Algorithms for Submodular Matching. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 19:1-19:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{banihashem_et_al:LIPIcs.ICALP.2025.19,
  author =	{Banihashem, Kiarash and Biabani, Leyla and Goudarzi, Samira and Hajiaghayi, MohammadTaghi and Jabbarzade, Peyman and Monemizadeh, Morteza},
  title =	{{Dynamic Algorithms for Submodular Matching}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{19:1--19:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.19},
  URN =		{urn:nbn:de:0030-drops-233969},
  doi =		{10.4230/LIPIcs.ICALP.2025.19},
  annote =	{Keywords: Matching, Submodular, Dynamic, Polylogarithmic}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.18

q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations

Authors: Kiril Bangachev and S. Matthew Weinberg

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

For a set M of m elements, we define a decreasing chain of classes of normalized monotone-increasing valuation functions from 2^M to ℝ_{≥ 0}, parameterized by an integer q ∈ [2,m]. For a given q, we refer to the class as q-partitioning. A valuation function is subadditive if and only if it is 2-partitioning, and fractionally subadditive if and only if it is m-partitioning. Thus, our chain establishes an interpolation between subadditive and fractionally subadditive valuations. We show that this interpolation is smooth (q-partitioning valuations are "nearly" (q-1)-partitioning in a precise sense, Theorem 6), interpretable (the definition arises by analyzing the core of a cost-sharing game, à la the Bondareva-Shapley Theorem for fractionally subadditive valuations, Section 3.1), and non-trivial (the class of q-partitioning valuations is distinct for all q, Proposition 3). For domains where provable separations exist between subadditive and fractionally subadditive, we interpolate the stronger guarantees achievable for fractionally subadditive valuations to all q ∈ {2,…, m}. Two highlights are the following: 1) An Ω ((log log q)/(log log m))-competitive posted price mechanism for q-partitioning valuations. Note that this matches asymptotically the state-of-the-art for both subadditive (q = 2) [Paul Dütting et al., 2020], and fractionally subadditive (q = m) [Feldman et al., 2015]. 2) Two upper-tail concentration inequalities on 1-Lipschitz, q-partitioning valuations over independent items. One extends the state-of-the-art for q = m to q < m, the other improves the state-of-the-art for q = 2 for q > 2. Our concentration inequalities imply several corollaries that interpolate between subadditive and fractionally subadditive, for example: 𝔼[v(S)] ≤ (1 + 1/log q)Median[v(S)] + O(log q). To prove this, we develop a new isoperimetric inequality using Talagrand’s method of control by q points, which may be of independent interest. We also discuss other probabilistic inequalities and game-theoretic applications of q-partitioning valuations, and connections to subadditive MPH-k valuations [Tomer Ezra et al., 2019].

Cite as

Kiril Bangachev and S. Matthew Weinberg. q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bangachev_et_al:LIPIcs.ICALP.2025.18,
  author =	{Bangachev, Kiril and Weinberg, S. Matthew},
  title =	{{q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.18},
  URN =		{urn:nbn:de:0030-drops-233956},
  doi =		{10.4230/LIPIcs.ICALP.2025.18},
  annote =	{Keywords: Subadditive Functions, Fractionally Subadditive Functions, Posted Price Mechanisms, Concentration Inequalities}
}

@InProceedings{bangachev_et_al:LIPIcs.ICALP.2025.18,
  author =	{Bangachev, Kiril and Weinberg, S. Matthew},
  title =	{{q-Partitioning Valuations: Exploring the Space Between Subadditive and Fractionally Subadditive Valuations}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.18},
  URN =		{urn:nbn:de:0030-drops-233956},
  doi =		{10.4230/LIPIcs.ICALP.2025.18},
  annote =	{Keywords: Subadditive Functions, Fractionally Subadditive Functions, Posted Price Mechanisms, Concentration Inequalities}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.101

Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case

Authors: Shaofeng H.-C. Jiang and Jianing Lou

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We devise ε-coresets for robust (k,z)-Clustering with m outliers through black-box reductions to vanilla clustering. Given an ε-coreset construction for vanilla clustering with size N, we construct coresets of size N⋅ polylog(kmε^{-1}) + O_z(min{kmε^{-1}, m ε^{-2z}log^z(kmε^{-1})}) for various metric spaces, where O_z hides 2^{O(zlog z)} factors. This increases the size of the vanilla coreset by a small multiplicative factor of polylog(kmε^{-1}), and the additive term is up to a (ε^{-1}log (km))^{O(z)} factor to the size of the optimal robust coreset. Plugging in recent vanilla coreset results of [Cohen-Addad, Saulpic and Schwiegelshohn, STOC'21; Cohen-Addad, Draganov, Russo, Saulpic and Schwiegelshohn, SODA'25], we obtain the first coresets for (k,z)-Clustering with m outliers with size near-linear in k while previous results have size at least Ω(k²) [Huang, Jiang, Lou and Wu, ICLR'23; Huang, Li, Lu and Wu, SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only O_z(km ε^{-1}) non-dense points to obtain the O_z(km ε^{-1}) additive bound. Another sufficient condition requires the vanilla coreset to possess an extra size-preserving property. To utilize this condition, we further give a black-box reduction that turns a vanilla coreset to the one that satisfies the said size-preserving property, and this leads to the alternative O_z(mε^{-2z}log^{z}(kmε^{-1})) additive size bound. We also give low-space implementations of our reductions in the dynamic streaming setting. Combined with known streaming constructions for vanilla coresets [Braverman, Frahling, Lang, Sohler and Yang, ICML'17; Hu, Song, Yang and Zhong, arXiv'1802.00459], we obtain the first dynamic streaming algorithms for coresets for k-Median (and k-Means) with m outliers, using space Õ(k + m) ⋅ poly(dε^{-1}log Δ) for inputs on a discrete grid [Δ]^d.

Cite as

Shaofeng H.-C. Jiang and Jianing Lou. Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 101:1-101:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2025.101,
  author =	{Jiang, Shaofeng H.-C. and Lou, Jianing},
  title =	{{Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{101:1--101:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.101},
  URN =		{urn:nbn:de:0030-drops-234781},
  doi =		{10.4230/LIPIcs.ICALP.2025.101},
  annote =	{Keywords: Coresets, clustering, outliers, streaming algorithms}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2025.62

On Approximability of 𝓁₂² Min-Sum Clustering

Authors: Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)

Abstract

The 𝓁₂² min-sum k-clustering problem is to partition an input set into clusters C_1,…,C_k to minimize ∑_{i=1}^k ∑_{p,q ∈ C_i} ‖p-q‖₂². Although 𝓁₂² min-sum k-clustering is NP-hard, it is not known whether it is NP-hard to approximate 𝓁₂² min-sum k-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the 𝓁₂² min-sum k-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than 1.056 and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a fast PTAS for 𝓁₂² min-sum k-clustering. Specifically, our algorithm runs in time O(n^{1+o(1)}d⋅ 2^{(k/ε)^O(1)}), which is the first nearly linear time algorithm for this problem. We also consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label i ∈ [k] for input point, thereby implicitly partitioning the input dataset into k clusters that induce an approximately optimal solution, up to some amount of adversarial error α ∈ [0,1/2). We give a polynomial-time algorithm that outputs a (1+γα)/(1-α)²-approximation to 𝓁₂² min-sum k-clustering, for a fixed constant γ > 0.

Cite as

Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, and Samson Zhou. On Approximability of 𝓁₂² Min-Sum Clustering. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 62:1-62:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{karthikc.s._et_al:LIPIcs.SoCG.2025.62,
  author =	{Karthik C. S. and Lee, Euiwoong and Rabani, Yuval and Schwiegelshohn, Chris and Zhou, Samson},
  title =	{{On Approximability of 𝓁₂² Min-Sum Clustering}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{62:1--62:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.62},
  URN =		{urn:nbn:de:0030-drops-232142},
  doi =		{10.4230/LIPIcs.SoCG.2025.62},
  annote =	{Keywords: Clustering, hardness of approximation, polynomial-time approximation schemes, learning-augmented algorithms}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.3

Private Estimation When Data and Privacy Demands Are Correlated

Authors: Syomantak Chaudhuri and Thomas A. Courtade

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations - first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.

Cite as

Syomantak Chaudhuri and Thomas A. Courtade. Private Estimation When Data and Privacy Demands Are Correlated. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chaudhuri_et_al:LIPIcs.FORC.2025.3,
  author =	{Chaudhuri, Syomantak and Courtade, Thomas A.},
  title =	{{Private Estimation When Data and Privacy Demands Are Correlated}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.3},
  URN =		{urn:nbn:de:0030-drops-231305},
  doi =		{10.4230/LIPIcs.FORC.2025.3},
  annote =	{Keywords: Differential Privacy, Personalized Privacy, Heterogeneous Privacy, Correlations in Privacy}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.5

Pessimism Traps and Algorithmic Interventions

Authors: Avrim Blum, Emily Diana, Kavya Ravichandran, and Alexander Tolbert

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

In this paper, we relate the philosophical literature on pessimism traps to information cascades, a formal model derived from the economics and mathematics literature. A pessimism trap is a social pattern in which individuals in a community, in situations of uncertainty, copy the sub-optimal actions of others, despite their individual beliefs. This maps nicely onto the concept of an information cascade, which involves a sequence of agents making a decision between two alternatives, with a private signal of the superior alternative and a public history of others' actions. Key results from the economics literature show that information cascades occur with probability one in many contexts, and depending on the strength of the signal, populations can fall into the incorrect cascade very easily and quickly. Once formed, in the absence of external perturbation, a cascade cannot be broken - therefore, we derive an intervention that can be used to nudge a population from an incorrect to a correct cascade and, importantly, maintain the cascade once the subsidy is discontinued. We extend this to the case of multiple communities, each of which might have a different optimal action, and a government providing subsidies that cannot discriminate between communities and does not know which action is optimal for each. We study this both theoretically and empirically.

Cite as

Avrim Blum, Emily Diana, Kavya Ravichandran, and Alexander Tolbert. Pessimism Traps and Algorithmic Interventions. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{blum_et_al:LIPIcs.FORC.2025.5,
  author =	{Blum, Avrim and Diana, Emily and Ravichandran, Kavya and Tolbert, Alexander},
  title =	{{Pessimism Traps and Algorithmic Interventions}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.5},
  URN =		{urn:nbn:de:0030-drops-231321},
  doi =		{10.4230/LIPIcs.FORC.2025.5},
  annote =	{Keywords: Pessimism trap, opinion dynamics, algorithmic interventions, subsidy, decision-making}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.4

Learning Aggregate Queries Defined by First-Order Logic with Counting

Authors: Steffen van Bergerem and Nicole Schweikardt

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

In the logical framework introduced by Grohe and Turán (TOCS 2004) for Boolean classification problems, the instances to classify are tuples from a logical structure, and Boolean classifiers are described by parametric models based on logical formulas. This is a specific scenario for supervised passive learning, where classifiers should be learned based on labelled examples. Existing results in this scenario focus on Boolean classification. This paper presents learnability results beyond Boolean classification. We focus on multiclass classification problems where the task is to assign input tuples to arbitrary integers. To represent such integer-valued classifiers, we use aggregate queries specified by an extension of first-order logic with counting terms called FOC₁. Our main result shows the following: given a database of polylogarithmic degree, within quasi-linear time, we can build an index structure that makes it possible to learn FOC₁-definable integer-valued classifiers in time polylogarithmic in the size of the database and polynomial in the number of training examples.

Cite as

Steffen van Bergerem and Nicole Schweikardt. Learning Aggregate Queries Defined by First-Order Logic with Counting. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 4:1-4:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{vanbergerem_et_al:LIPIcs.ICDT.2025.4,
  author =	{van Bergerem, Steffen and Schweikardt, Nicole},
  title =	{{Learning Aggregate Queries Defined by First-Order Logic with Counting}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{4:1--4:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.4},
  URN =		{urn:nbn:de:0030-drops-229457},
  doi =		{10.4230/LIPIcs.ICDT.2025.4},
  annote =	{Keywords: Supervised learning, multiclass classification problems, counting logic}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.46

Data-Driven Solution Portfolios

Authors: Marina Drygala, Silvio Lattanzi, Andreas Maggiori, Miltiadis Stouras, Ola Svensson, and Sergei Vassilvitskii

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

In this paper, we consider a new problem of portfolio optimization using stochastic information. In a setting where there is some uncertainty, we ask how to best select k potential solutions, with the goal of optimizing the value of the best solution. More formally, given a combinatorial problem Π, a set of value functions 𝒱 over the solutions of Π, and a distribution 𝒟 over 𝒱, our goal is to select k solutions of Π that maximize or minimize the expected value of the best of those solutions. For a simple example, consider the classic knapsack problem: given a universe of elements each with unit weight and a positive value, the task is to select r elements maximizing the total value. Now suppose that each element’s weight comes from a (known) distribution. How should we select k different solutions so that one of them is likely to yield a high value? In this work, we tackle this basic problem, and generalize it to the setting where the underlying set system forms a matroid. On the technical side, it is clear that the candidate solutions we select must be diverse and anti-correlated; however, it is not clear how to do so efficiently. Our main result is a polynomial-time algorithm that constructs a portfolio within a constant factor of the optimal.

Cite as

Marina Drygala, Silvio Lattanzi, Andreas Maggiori, Miltiadis Stouras, Ola Svensson, and Sergei Vassilvitskii. Data-Driven Solution Portfolios. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 46:1-46:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{drygala_et_al:LIPIcs.ITCS.2025.46,
  author =	{Drygala, Marina and Lattanzi, Silvio and Maggiori, Andreas and Stouras, Miltiadis and Svensson, Ola and Vassilvitskii, Sergei},
  title =	{{Data-Driven Solution Portfolios}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{46:1--46:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.46},
  URN =		{urn:nbn:de:0030-drops-226740},
  doi =		{10.4230/LIPIcs.ITCS.2025.46},
  annote =	{Keywords: solution portfolios, data-driven algorithm design, matroids}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.55

Completeness Theorems for k-SUM and Geometric Friends: Deciding Fragments of Linear Integer Arithmetic

Authors: Geri Gokaj and Marvin Künnemann

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

In the last three decades, the k-SUM hypothesis has emerged as a satisfying explanation of long-standing time barriers for a variety of algorithmic problems. Yet to this day, the literature knows of only few proven consequences of a refutation of this hypothesis. Taking a descriptive complexity viewpoint, we ask: What is the largest logically defined class of problems captured by the k-SUM problem? To this end, we introduce a class FOP_ℤ of problems corresponding to deciding sentences in Presburger arithmetic/linear integer arithmetic over finite subsets of integers. We establish two large fragments for which the k-SUM problem is complete under fine-grained reductions: 1) The k-SUM problem is complete for deciding the sentences with k existential quantifiers. 2) The 3-SUM problem is complete for all 3-quantifier sentences of FOP_ℤ expressible using at most 3 linear inequalities. Specifically, a faster-than-n^{⌈k/2⌉ ± o(1)} algorithm for k-SUM (or faster-than-n^{2 ± o(1)} algorithm for 3-SUM, respectively) directly translate to polynomial speedups of a general algorithm for all sentences in the respective fragment. Observing a barrier for proving completeness of 3-SUM for the entire class FOP_ℤ, we turn to the question which other - seemingly more general - problems are complete for FOP_ℤ. In this direction, we establish FOP_ℤ-completeness of the problem pair of Pareto Sum Verification and Hausdorff Distance under n Translations under the L_∞/L₁ norm in ℤ^d. In particular, our results invite to investigate Pareto Sum Verification as a high-dimensional generalization of 3-SUM.

Cite as

Geri Gokaj and Marvin Künnemann. Completeness Theorems for k-SUM and Geometric Friends: Deciding Fragments of Linear Integer Arithmetic. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 55:1-55:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gokaj_et_al:LIPIcs.ITCS.2025.55,
  author =	{Gokaj, Geri and K\"{u}nnemann, Marvin},
  title =	{{Completeness Theorems for k-SUM and Geometric Friends: Deciding Fragments of Linear Integer Arithmetic}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{55:1--55:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.55},
  URN =		{urn:nbn:de:0030-drops-226835},
  doi =		{10.4230/LIPIcs.ITCS.2025.55},
  annote =	{Keywords: fine-grained complexity theory, descriptive complexity, presburger arithmetic, completeness results, k-SUM}
}

Document

DOI: 10.4230/LIPIcs.CSL.2025.17

Description Complexity of Unary Structures in First-Order Logic with Links to Entropy

Authors: Reijo Jaakkola, Antti Kuusisto, and Miikka Vilander

Published in: LIPIcs, Volume 326, 33rd EACSL Annual Conference on Computer Science Logic (CSL 2025)

Abstract

The description complexity of a model is the length of the shortest formula that defines the model. We study the description complexity of unary structures in first-order logic FO, also drawing links to semantic complexity in the form of entropy. The class of unary structures provides, e.g., a simple way to represent tabular Boolean data sets as relational structures. We define structures with FO-formulas that are strictly linear in the size of the model as opposed to using the naive quadratic ones, and we use arguments based on formula size games to obtain related lower bounds for description complexity. For a typical structure the upper and lower bounds in fact match up to a sublinear term, leading to a precise asymptotic result on the expected description complexity of a randomly selected structure. We then give bounds on the relationship between Shannon entropy and description complexity. We extend this relationship also to Boltzmann entropy by establishing an asymptotic match between the two entropies. Despite the simplicity of unary structures, our arguments require the use of formula size games, Stirling’s approximation and Chernoff bounds.

Cite as

Reijo Jaakkola, Antti Kuusisto, and Miikka Vilander. Description Complexity of Unary Structures in First-Order Logic with Links to Entropy. In 33rd EACSL Annual Conference on Computer Science Logic (CSL 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 326, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{jaakkola_et_al:LIPIcs.CSL.2025.17,
  author =	{Jaakkola, Reijo and Kuusisto, Antti and Vilander, Miikka},
  title =	{{Description Complexity of Unary Structures in First-Order Logic with Links to Entropy}},
  booktitle =	{33rd EACSL Annual Conference on Computer Science Logic (CSL 2025)},
  pages =	{17:1--17:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-362-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{326},
  editor =	{Endrullis, J\"{o}rg and Schmitz, Sylvain},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CSL.2025.17},
  URN =		{urn:nbn:de:0030-drops-227749},
  doi =		{10.4230/LIPIcs.CSL.2025.17},
  annote =	{Keywords: formula size, finite model theory, formula size games, entropy, randomness}
}

25 Search Results for "Balcan, Maria-Florina"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message