DROPS

Document

DOI: 10.4230/LIPIcs.ICDT.2024.14

The Importance of Parameters in Database Queries

Authors: Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter. The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.

Cite as

Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke. The Importance of Parameters in Database Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2024.14,
  author =	{Grohe, Martin and Kimelfeld, Benny and Lindner, Peter and Standke, Christoph},
  title =	{{The Importance of Parameters in Database Queries}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.14},
  URN =		{urn:nbn:de:0030-drops-197966},
  doi =		{10.4230/LIPIcs.ICDT.2024.14},
  annote =	{Keywords: SHAP score, query parameters, Shapley value}
}

Document

DOI: 10.4230/LIPIcs.STACS.2024.19

The Complexity of Homomorphism Reconstructibility

Authors: Jan Böker, Louis Härtel, Nina Runde, Tim Seppelt, and Christoph Standke

Published in: LIPIcs, Volume 289, 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)

Abstract

Representing graphs by their homomorphism counts has led to the beautiful theory of homomorphism indistinguishability in recent years. Moreover, homomorphism counts have promising applications in database theory and machine learning, where one would like to answer queries or classify graphs solely based on the representation of a graph G as a finite vector of homomorphism counts from some fixed finite set of graphs to G. We study the computational complexity of the arguably most fundamental computational problem associated to these representations, the homomorphism reconstructability problem: given a finite sequence of graphs and a corresponding vector of natural numbers, decide whether there exists a graph G that realises the given vector as the homomorphism counts from the given graphs. We show that this problem yields a natural example of an NP^#𝖯-hard problem, which still can be NP-hard when restricted to a fixed number of input graphs of bounded treewidth and a fixed input vector of natural numbers, or alternatively, when restricted to a finite input set of graphs. We further show that, when restricted to a finite input set of graphs and given an upper bound on the order of the graph G as additional input, the problem cannot be NP-hard unless 𝖯 = NP. For this regime, we obtain partial positive results. We also investigate the problem’s parameterised complexity and provide fpt-algorithms for the case that a single graph is given and that multiple graphs of the same order with subgraph instead of homomorphism counts are given.

Cite as

Jan Böker, Louis Härtel, Nina Runde, Tim Seppelt, and Christoph Standke. The Complexity of Homomorphism Reconstructibility. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 289, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boker_et_al:LIPIcs.STACS.2024.19,
  author =	{B\"{o}ker, Jan and H\"{a}rtel, Louis and Runde, Nina and Seppelt, Tim and Standke, Christoph},
  title =	{{The Complexity of Homomorphism Reconstructibility}},
  booktitle =	{41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-311-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{289},
  editor =	{Beyersdorff, Olaf and Kant\'{e}, Mamadou Moustapha and Kupferman, Orna and Lokshtanov, Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2024.19},
  URN =		{urn:nbn:de:0030-drops-197298},
  doi =		{10.4230/LIPIcs.STACS.2024.19},
  annote =	{Keywords: graph homomorphism, counting complexity, parameterised complexity}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2023.20

Probabilistic Query Evaluation with Bag Semantics

Authors: Martin Grohe, Peter Lindner, and Christoph Standke

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)

Abstract

We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

Cite as

Martin Grohe, Peter Lindner, and Christoph Standke. Probabilistic Query Evaluation with Bag Semantics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2023.20,
  author =	{Grohe, Martin and Lindner, Peter and Standke, Christoph},
  title =	{{Probabilistic Query Evaluation with Bag Semantics}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.20},
  URN =		{urn:nbn:de:0030-drops-177636},
  doi =		{10.4230/LIPIcs.ICDT.2023.20},
  annote =	{Keywords: Probabilistic Query Evaluation, Probabilistic Databases, Bag Semantics}
}

Search Results

Documents authored by Standke, Christoph

The Importance of Parameters in Database Queries

Abstract

Cite as

The Complexity of Homomorphism Reconstructibility

Abstract

Cite as

Probabilistic Query Evaluation with Bag Semantics

Abstract

Cite as