DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.46

Max-Distance Sparsification for Diversification and Clustering

Authors: Soh Kumabe

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

Let 𝒟 be a set family that is the solution domain of some combinatorial problem. The max-min diversification problem on 𝒟 is the problem to select k sets from 𝒟 such that the Hamming distance between any two selected sets is at least d. FPT algorithms parameterized by k+𝓁, where 𝓁 = max_{D ∈ 𝒟}|D|, and k+d have been actively studied recently for several specific domains. This paper provides unified algorithmic frameworks to solve this problem. Specifically, for each parameterization k+𝓁 and k+d, we provide an FPT oracle algorithm for the max-min diversification problem using oracles related to 𝒟. We then demonstrate that our frameworks provide the first FPT algorithms on several new domains 𝒟, including the domain of t-linear matroid intersection, almost 2-SAT, minimum edge s,t-flows, vertex sets of s,t-mincut, vertex sets of edge bipartization, and Steiner trees. We also demonstrate that our frameworks generalize most of the existing domain-specific tractability results. Our main technical breakthrough is introducing the notion of max-distance sparsifier of 𝒟, a domain on which the max-min diversification problem is equivalent to the same problem on the original domain 𝒟. The core of our framework is to design FPT oracle algorithms that construct a constant-size max-distance sparsifier of 𝒟. Using max-distance sparsifiers, we provide FPT algorithms for the max-min and max-sum diversification problems on 𝒟, as well as k-center and k-sum-of-radii clustering problems on 𝒟, which are also natural problems in the context of diversification and have their own interests.

Cite as

Soh Kumabe. Max-Distance Sparsification for Diversification and Clustering. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 46:1-46:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kumabe:LIPIcs.ESA.2025.46,
  author =	{Kumabe, Soh},
  title =	{{Max-Distance Sparsification for Diversification and Clustering}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{46:1--46:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.46},
  URN =		{urn:nbn:de:0030-drops-245146},
  doi =		{10.4230/LIPIcs.ESA.2025.46},
  annote =	{Keywords: Fixed-Parameter Tractability, Diversification, Clustering}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2025.40

Sublinear Space Graph Algorithms in the Continual Release Model

Authors: Alessandro Epasto, Quanquan C. Liu, Tamalika Mukherjee, and Felix Zhou

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)

Abstract

The graph continual release model of differential privacy seeks to produce differentially private solutions to graph problems under a stream of edge updates where new private solutions are released after each update. Thus far, previously known edge-differentially private algorithms for most graph problems including densest subgraph and matchings in the continual release setting only output real-value estimates (not vertex subset solutions) and do not use sublinear space. Instead, they rely on computing exact graph statistics on the input [Hendrik Fichtenberger et al., 2021; Shuang Song et al., 2018]. In this paper, we leverage sparsification to address the above shortcomings for edge-insertion streams. Our edge-differentially private algorithms use sublinear space with respect to the number of edges in the graph while some also achieve sublinear space in the number of vertices in the graph. In addition, for the densest subgraph problem, we also output edge-differentially private vertex subset solutions; no previous graph algorithms in the continual release model output such subsets. We make novel use of assorted sparsification techniques from the non-private streaming and static graph algorithms literature to achieve new results in the sublinear space, continual release setting. This includes algorithms for densest subgraph, maximum matching, as well as the first continual release k-core decomposition algorithm. We also develop a novel sparse level data structure for k-core decomposition that may be of independent interest. To complement our insertion-only algorithms, we conclude with polynomial additive error lower bounds for edge-privacy in the fully dynamic setting, where only logarithmic lower bounds were previously known.

Cite as

Alessandro Epasto, Quanquan C. Liu, Tamalika Mukherjee, and Felix Zhou. Sublinear Space Graph Algorithms in the Continual Release Model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 40:1-40:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{epasto_et_al:LIPIcs.APPROX/RANDOM.2025.40,
  author =	{Epasto, Alessandro and Liu, Quanquan C. and Mukherjee, Tamalika and Zhou, Felix},
  title =	{{Sublinear Space Graph Algorithms in the Continual Release Model}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{40:1--40:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.40},
  URN =		{urn:nbn:de:0030-drops-244064},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.40},
  annote =	{Keywords: Differential Privacy, Continual Release, Densest Subgraph, k-Core Decomposition, Maximum Matching}
}

@InProceedings{epasto_et_al:LIPIcs.APPROX/RANDOM.2025.40,
  author =	{Epasto, Alessandro and Liu, Quanquan C. and Mukherjee, Tamalika and Zhou, Felix},
  title =	{{Sublinear Space Graph Algorithms in the Continual Release Model}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{40:1--40:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.40},
  URN =		{urn:nbn:de:0030-drops-244064},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.40},
  annote =	{Keywords: Differential Privacy, Continual Release, Densest Subgraph, k-Core Decomposition, Maximum Matching}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2025.12

Relational Approximations for Subspace Primitives

Authors: Xiang Liu and Kasturi Varadarajan

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)

Abstract

We explore fundamental geometric computations on point sets that are given to the algorithm implicitly. In particular, we are given a database which is a collection of tables with numerical values, and the geometric computation is to be performed on the join of the tables. Explicitly computing this join takes time exponential in the size of the tables. We are therefore interested in geometric problems that can be solved by algorithms whose running time is a polynomial in the size of the tables. Such relational algorithms are typically not able to explicitly compute the join. To avoid the NP-completeness bottleneck, researchers assume that the tables have a tractable combinatorial structure, like being acyclic. Even with this assumption, simple geometric computations turn out to be non-trivial and sometimes intractable. In this article, we study the problem of computing the maximum distance of a point in the join to a given subspace, and develop approximation algorithms for this NP-hard problem.

Cite as

Xiang Liu and Kasturi Varadarajan. Relational Approximations for Subspace Primitives. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.APPROX/RANDOM.2025.12,
  author =	{Liu, Xiang and Varadarajan, Kasturi},
  title =	{{Relational Approximations for Subspace Primitives}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.12},
  URN =		{urn:nbn:de:0030-drops-243781},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.12},
  annote =	{Keywords: relational algorithm, Euclidean distance, subspace approximation}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2025.8

Enumeration of Minimal Hitting Sets Parameterized by Treewidth

Authors: Batya Kenig and Dan Shlomo Mizrahi

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Enumerating the minimal hitting sets of a hypergraph is a problem which arises in many data management applications that include constraint mining, discovering unique column combinations, and enumerating database repairs. Previously, Eiter et al. [Thomas Eiter et al., 2003] showed that the minimal hitting sets of an n-vertex hypergraph, with treewidth w, can be enumerated with delay O^*(n^w) (ignoring polynomial factors), with space requirements that scale with the output size. We improve this to fixed-parameter-linear delay, following an FPT preprocessing phase. The memory consumption of our algorithm is exponential with respect to the treewidth of the hypergraph.

Cite as

Batya Kenig and Dan Shlomo Mizrahi. Enumeration of Minimal Hitting Sets Parameterized by Treewidth. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kenig_et_al:LIPIcs.ICDT.2025.8,
  author =	{Kenig, Batya and Mizrahi, Dan Shlomo},
  title =	{{Enumeration of Minimal Hitting Sets Parameterized by Treewidth}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.8},
  URN =		{urn:nbn:de:0030-drops-229498},
  doi =		{10.4230/LIPIcs.ICDT.2025.8},
  annote =	{Keywords: Enumeration, Hitting sets}
}

Document

Database Theory in Action

DOI: 10.4230/LIPIcs.ICDT.2025.36

Database Theory in Action: Cypher, GQL, and Regular Path Queries

Authors: Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

Cypher has so far been the most commonly used query language for property graphs, and served as the foundation of the recently standardized graph query language GQL. In designing the features of GQL, the standards committee addressed the perceived limitations of Cypher. One such limitation is the inability of Cypher, as originally designed, to express all regular path queries (RPQs). Despite this claim having been stated many times as a folklore result, we could not find any proof of it. In this note we formalize the core of Cypher’s pattern matching and formally prove that indeed it falls short of all RPQs, justifying the inclusion of new pattern matching features in GQL.

Cite as

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, and Alexandra Rogova. Database Theory in Action: Cypher, GQL, and Regular Path Queries. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 36:1-36:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gheerbrant_et_al:LIPIcs.ICDT.2025.36,
  author =	{Gheerbrant, Am\'{e}lie and Libkin, Leonid and Peterfreund, Liat and Rogova, Alexandra},
  title =	{{Database Theory in Action: Cypher, GQL, and Regular Path Queries}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{36:1--36:5},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.36},
  URN =		{urn:nbn:de:0030-drops-229771},
  doi =		{10.4230/LIPIcs.ICDT.2025.36},
  annote =	{Keywords: Regular path queries, Cypher, GQL, inexpressibility}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.5

Edge-Minimum Walk of Modular Length in Polynomial Time

Authors: Antoine Amarilli, Benoît Groz, and Nicole Wein

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

We study the problem of finding, in a directed graph, an st-walk of length r od q which is edge-minimum, i.e., uses the smallest number of distinct edges. Despite the vast literature on paths and cycles with modularity constraints, to the best of our knowledge we are the first to study this problem. Our main result is a polynomial-time algorithm that solves this task when r and q are constants. We also show how our proof technique gives an algorithm to solve a generalization of the well-known Directed Steiner Network problem, in which connections between endpoint pairs are required to satisfy modularity constraints on their length. Our algorithm is polynomial when the number of endpoint pairs and the modularity constraints on the pairs are constants. In this version of the article, proofs and examples are omitted because of space constraints. Detailed proofs are available in the full version [Antoine Amarilli et al., 2024].

Cite as

Antoine Amarilli, Benoît Groz, and Nicole Wein. Edge-Minimum Walk of Modular Length in Polynomial Time. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ITCS.2025.5,
  author =	{Amarilli, Antoine and Groz, Beno\^{i}t and Wein, Nicole},
  title =	{{Edge-Minimum Walk of Modular Length in Polynomial Time}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{5:1--5:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.5},
  URN =		{urn:nbn:de:0030-drops-226330},
  doi =		{10.4230/LIPIcs.ITCS.2025.5},
  annote =	{Keywords: Directed Steiner Network, Modularity}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.6

Range Entropy Queries and Partitioning

Authors: Sanjay Krishnan and Stavros Sintos

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Data partitioning that maximizes or minimizes Shannon entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to find the entropy in different subsets of data when the algorithm needs to decide what block to construct. While it is generally known how to compute the entropy of a discrete distribution efficiently, we want to efficiently derive the entropy among the data items that lie in a specific area. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set P of n weighted and colored points in ℝ^d. The goal is to construct a low space data structure, such that given a query (hyper)rectangle R, it computes the entropy based on the colors of the points in P∩ R, in sublinear time. We show a conditional lower bound for this problem proving that we cannot hope for data structures with near-linear space and near-constant query time. Then, we propose exact data structures for d = 1 and d > 1 with o(n^{2d}) space and o(n) query time. We also provide a tune parameter t that the user can choose to bound the asymptotic space and query time of the new data structures. Next, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the entropy. Finally, we show how we can use the new data structures to efficiently partition time series and histograms with respect to entropy.

Cite as

Sanjay Krishnan and Stavros Sintos. Range Entropy Queries and Partitioning. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{krishnan_et_al:LIPIcs.ICDT.2024.6,
  author =	{Krishnan, Sanjay and Sintos, Stavros},
  title =	{{Range Entropy Queries and Partitioning}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{6:1--6:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.6},
  URN =		{urn:nbn:de:0030-drops-197883},
  doi =		{10.4230/LIPIcs.ICDT.2024.6},
  annote =	{Keywords: Shannon entropy, range query, data structure, data partitioning}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.18

Computing Data Distribution from Query Selectivities

Authors: Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We are given a set 𝒵 = {(R_1,s_1), …, (R_n,s_n)}, where each R_i is a range in ℝ^d, such as rectangle or ball, and s_i ∈ [0,1] denotes its selectivity. The goal is to compute a small-size discrete data distribution 𝒟 = {(q₁,w₁),…, (q_m,w_m)}, where q_j ∈ ℝ^d and w_j ∈ [0,1] for each 1 ≤ j ≤ m, and ∑_{1≤j≤m} w_j = 1, such that 𝒟 is the most consistent with 𝒵, i.e., err_p(𝒟,𝒵) = 1/n ∑_{i = 1}ⁿ |s_i - ∑_{j=1}^m w_j⋅1(q_j ∈ R_i)|^p is minimized. In a database setting, 𝒵 corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and 𝒟 can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is NP-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time O((n+δ^{-d}) δ^{-2} polylog n), a discrete distribution 𝒟̃ of size O(δ^{-2}), such that err_p(𝒟̃,𝒵) ≤ min_𝒟 err_p(𝒟,𝒵)+δ (for p = 1,2,∞) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely.

Cite as

Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang. Computing Data Distribution from Query Selectivities. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.ICDT.2024.18,
  author =	{Agarwal, Pankaj K. and Raychaudhury, Rahul and Sintos, Stavros and Yang, Jun},
  title =	{{Computing Data Distribution from Query Selectivities}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.18},
  URN =		{urn:nbn:de:0030-drops-198007},
  doi =		{10.4230/LIPIcs.ICDT.2024.18},
  annote =	{Keywords: selectivity queries, discrete distributions, Multiplicative Weights Update, eps-approximation, learnable functions, depth problem, arrangement}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.24

Finding Smallest Witnesses for Conjunctive Queries

Authors: Xiao Hu and Stavros Sintos

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

A witness is a sub-database that preserves the query results of the original database but of much smaller size. It has wide applications in query rewriting and debugging, query explanation, IoT analytics, multi-layer network routing, etc. In this paper, we study the smallest witness problem (SWP) for the class of conjunctive queries (CQs) without self-joins. We first establish the dichotomy that SWP for a CQ can be computed in polynomial time if and only if it has head-cluster property, unless P = NP. We next turn to the approximated version by relaxing the size of a witness from being minimum. We surprisingly find that the head-domination property - that has been identified for the deletion propagation problem [Kimelfeld et al., 2012] - can also precisely capture the hardness of the approximated smallest witness problem. In polynomial time, SWP for any CQ with head-domination property can be approximated within a constant factor, while SWP for any CQ without such a property cannot be approximated within a logarithmic factor, unless P = NP. We further explore efficient approximation algorithms for CQs without head-domination property: (1) we show a trivial algorithm which achieves a polynomially large approximation ratio for general CQs; (2) for any CQ with only one non-output attribute, such as star CQs, we show a greedy algorithm with a logarithmic approximation ratio; (3) for line CQs, which contain at least two non-output attributes, we relate SWP problem to the directed steiner forest problem, whose algorithms can be applied to line CQs directly. Meanwhile, we establish a much higher lower bound, exponentially larger than the logarithmic lower bound obtained above. It remains open to close the gap between the lower and upper bound of the approximated SWP for CQs without head-domination property.

Cite as

Xiao Hu and Stavros Sintos. Finding Smallest Witnesses for Conjunctive Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hu_et_al:LIPIcs.ICDT.2024.24,
  author =	{Hu, Xiao and Sintos, Stavros},
  title =	{{Finding Smallest Witnesses for Conjunctive Queries}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{24:1--24:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.24},
  URN =		{urn:nbn:de:0030-drops-198066},
  doi =		{10.4230/LIPIcs.ICDT.2024.24},
  annote =	{Keywords: conjunctive query, smallest witness, head-cluster, head-domination}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.11

Dynamic Enumeration of Similarity Joins

Authors: Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of n points A,B in ℝ^d, a metric ϕ(⋅), and a distance threshold r > 0, report all pairs of points (a, b) ∈ A × B with ϕ(a,b) ≤ r. Our goal is to store A,B into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from A or B. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for 𝓁₁, 𝓁_∞ metrics with log^{O(1)} n update time and delay. We show that such a data structure is not feasible for the 𝓁₂ metric for d ≥ 4. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for 𝓁_p metric, with log^{O(1)} n delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Cite as

Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang. Dynamic Enumeration of Similarity Joins. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 11:1-11:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.ICALP.2021.11,
  author =	{Agarwal, Pankaj K. and Hu, Xiao and Sintos, Stavros and Yang, Jun},
  title =	{{Dynamic Enumeration of Similarity Joins}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{11:1--11:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.11},
  URN =		{urn:nbn:de:0030-drops-140803},
  doi =		{10.4230/LIPIcs.ICALP.2021.11},
  annote =	{Keywords: dynamic enumeration, similarity joins, worst-case delay guarantee}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.19

The Maximum Exposure Problem

Authors: Neeraj Kumar, Stavros Sintos, and Subhash Suri

Published in: LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Abstract

Given a set of points P and axis-aligned rectangles R in the plane, a point p in P is called exposed if it lies outside all rectangles in R. In the max-exposure problem, given an integer parameter k, we want to delete k rectangles from R so as to maximize the number of exposed points. We show that the problem is NP-hard and assuming plausible complexity conjectures is also hard to approximate even when rectangles in R are translates of two fixed rectangles. However, if R only consists of translates of a single rectangle, we present a polynomial-time approximation scheme. For general rectangle range space, we present a simple O(k) bicriteria approximation algorithm; that is by deleting O(k^2) rectangles, we can expose at least Omega(1/k) of the optimal number of points.

Cite as

Neeraj Kumar, Stavros Sintos, and Subhash Suri. The Maximum Exposure Problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{kumar_et_al:LIPIcs.APPROX-RANDOM.2019.19,
  author =	{Kumar, Neeraj and Sintos, Stavros and Suri, Subhash},
  title =	{{The Maximum Exposure Problem}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-125-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{145},
  editor =	{Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.19},
  URN =		{urn:nbn:de:0030-drops-112344},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2019.19},
  annote =	{Keywords: max-exposure, PTAS, densest k-subgraphs, geometric constraint removal, Network resilience}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.10

Approximating Distance Measures for the Skyline

Authors: Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.

Cite as

Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk. Approximating Distance Measures for the Skyline. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{kumar_et_al:LIPIcs.ICDT.2019.10,
  author =	{Kumar, Nirman and Raichel, Benjamin and Sintos, Stavros and Van Buskirk, Gregory},
  title =	{{Approximating Distance Measures for the Skyline}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.10},
  URN =		{urn:nbn:de:0030-drops-103125},
  doi =		{10.4230/LIPIcs.ICDT.2019.10},
  annote =	{Keywords: Skyline, Pareto optimal, Approximation, Hardness, Multi-criteria decision making}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2018.5

Computing Shortest Paths in the Plane with Removable Obstacles

Authors: Pankaj K. Agarwal, Neeraj Kumar, Stavros Sintos, and Subhash Suri

Published in: LIPIcs, Volume 101, 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)

Abstract

We consider the problem of computing a Euclidean shortest path in the presence of removable obstacles in the plane. In particular, we have a collection of pairwise-disjoint polygonal obstacles, each of which may be removed at some cost c_i > 0. Given a cost budget C > 0, and a pair of points s, t, which obstacles should be removed to minimize the path length from s to t in the remaining workspace? We show that this problem is NP-hard even if the obstacles are vertical line segments. Our main result is a fully-polynomial time approximation scheme (FPTAS) for the case of convex polygons. Specifically, we compute an (1 + epsilon)-approximate shortest path in time O({nh}/{epsilon^2} log n log n/epsilon) with removal cost at most (1+epsilon)C, where h is the number of obstacles, n is the total number of obstacle vertices, and epsilon in (0, 1) is a user-specified parameter. Our approximation scheme also solves a shortest path problem for a stochastic model of obstacles, where each obstacle's presence is an independent event with a known probability. Finally, we also present a data structure that can answer s-t path queries in polylogarithmic time, for any pair of points s, t in the plane.

Cite as

Pankaj K. Agarwal, Neeraj Kumar, Stavros Sintos, and Subhash Suri. Computing Shortest Paths in the Plane with Removable Obstacles. In 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 101, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.SWAT.2018.5,
  author =	{Agarwal, Pankaj K. and Kumar, Neeraj and Sintos, Stavros and Suri, Subhash},
  title =	{{Computing Shortest Paths in the Plane with Removable Obstacles}},
  booktitle =	{16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-068-2},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{101},
  editor =	{Eppstein, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2018.5},
  URN =		{urn:nbn:de:0030-drops-88312},
  doi =		{10.4230/LIPIcs.SWAT.2018.5},
  annote =	{Keywords: Euclidean shortest paths, Removable polygonal obstacles, Stochastic shortest paths, L\underline1 shortest paths}
}

Document

DOI: 10.4230/LIPIcs.SEA.2017.7

Efficient Algorithms for k-Regret Minimizing Sets

Authors: Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)

Abstract

A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item's attributes with a user's weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d>=3, settling an open problem from Chester et al. [VLDB 2014]. Our main algorithmic contributions are two approximation algorithms, both with provable guarantees, one based on coresets and another based on hitting sets. We perform extensive experimental evaluation of our algorithms, using both real-world and synthetic data, and compare their performance against the solution proposed in [VLDB 14]. The results show that our algorithms are significantly faster and scalable to much larger sets than the greedy algorithm of Chester et al. for comparable quality answers.

Cite as

Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri. Efficient Algorithms for k-Regret Minimizing Sets. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 7:1-7:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.SEA.2017.7,
  author =	{Agarwal, Pankaj K. and Kumar, Nirman and Sintos, Stavros and Suri, Subhash},
  title =	{{Efficient Algorithms for k-Regret Minimizing Sets}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{7:1--7:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.7},
  URN =		{urn:nbn:de:0030-drops-76321},
  doi =		{10.4230/LIPIcs.SEA.2017.7},
  annote =	{Keywords: regret minimizing sets, skyline, top-k query, coreset, hitting set}
}

14 Search Results for "Sintos, Stavros"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message