eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
1
484
10.4230/LIPIcs.ICDT.2024
article
LIPIcs, Volume 290, ICDT 2024, Complete Volume
Cormode, Graham
1
https://orcid.org/0000-0002-0698-0922
Shekelyan, Michael
2
https://orcid.org/0000-0002-6500-2192
University of Warwick, UK
Queen Mary University of London, UK
LIPIcs, Volume 290, ICDT 2024, Complete Volume
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024/LIPIcs.ICDT.2024.pdf
LIPIcs, Volume 290, ICDT 2024, Complete Volume
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
0:i
0:xvi
10.4230/LIPIcs.ICDT.2024.0
article
Front Matter, Table of Contents, Preface, Conference Organization
Cormode, Graham
1
https://orcid.org/0000-0002-0698-0922
Shekelyan, Michael
2
https://orcid.org/0000-0002-6500-2192
University of Warwick, UK
Queen Mary University of London, UK
Front Matter, Table of Contents, Preface, Conference Organization
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.0/LIPIcs.ICDT.2024.0.pdf
Front Matter
Table of Contents
Preface
Conference Organization
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
1:1
1:22
10.4230/LIPIcs.ICDT.2024.1
article
Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk)
Koutrika, Georgia
1
https://orcid.org/0000-0002-7377-0116
Athena Research Center, Athens, Greece
Back in 1970’s, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs) became the holy grail for the database community. Different approaches have been proposed from the database, machine learning and NLP communities. Interest in the topic has had its peaks and valleys. After a long and adventurous journey of almost 50 years, there is a rekindled interest in NLIDBs in recent years, fueled by the need for democratizing data access and by the recent advances in deep learning and natural language processing in particular. There is a surge of works on natural language interfaces for databases using neural translation, and suddenly it becomes hard to keep up with advancements in the field. Are we close to finding the holy grail of data access? What are the lurking challenges that we need to surpass and what research opportunities arise? Finally, what is the role of the database community?
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.1/LIPIcs.ICDT.2024.1.pdf
natural language data interfaces
NLIDBs
NL-to-SQL
text-to-SQL
conversational databases
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
2:1
2:9
10.4230/LIPIcs.ICDT.2024.2
article
How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)
Roy, Sudeepa
1
2
https://orcid.org/0009-0002-8300-7891
Gilad, Amir
3
https://orcid.org/0000-0002-3764-1958
Hu, Yihao
1
https://orcid.org/0009-0003-3048-3867
Meng, Hanze
1
https://orcid.org/0009-0003-8747-7716
Miao, Zhengjie
4
https://orcid.org/0009-0008-2371-1186
Stephens-Martinez, Kristin
5
https://orcid.org/0000-0002-3058-7418
Yang, Jun
1
https://orcid.org/0000-0003-0604-6790
Duke University, Durham, NC, USA
RelationalAI, Berkeley, CA, USA (Visiting Scientist)
Hebrew University of Jerusalem, Israel
Simon Fraser University, Burnaby, Canada
Duke University, Durham NC, USA
Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.2/LIPIcs.ICDT.2024.2.pdf
Query Debugging
SQL
Relational Algebra
Relational Calculus
Database Education
Boolean Provenance
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
3:1
3:1
10.4230/LIPIcs.ICDT.2024.3
article
Rule-Based Ontologies: From Semantics to Syntax (Invited Talk)
Pieris, Andreas
1
2
https://orcid.org/0000-0003-4779-3469
University of Edinburgh, UK
University of Cyprus, Nicosia, Cyprus
An ontology specifies an abstract model of a domain of interest via a formal language that is typically based on logic. Tuple-generating dependencies (tgds) and equality-generating dependencies (egds) originally introduced as a unifying framework for database integrity constraints, and later on used in data exchange and integration, are well suited for modeling ontologies that are intended for data-intensive tasks. The reason is that, unlike other popular formalisms such as description logics, tgds and egds can easily handle higher-arity relations that naturally occur in relational databases. In recent years, there has been an extensive study of tgd- and egd-based ontologies and of their applications to several different data-intensive tasks. In those studies, model theory plays a crucial role and it typically proceeds from syntax to semantics. In other words, the syntax of an ontology language is introduced first and then the properties of the mathematical structures that satisfy ontologies of that language are explored. There is, however, a mature and growing body of research in the reverse direction, i.e., from semantics to syntax. Here, the starting point is a collection of model-theoretic properties and the goal is to determine whether or not these properties characterize some ontology language. Such results are welcome as they pinpoint the expressive power of an ontology language in terms of insightful model-theoretic properties. The main aim of this tutorial is to present a comprehensive overview of model-theoretic characterizations of tgd- and egd-based ontology languages that are encountered in database theory and symbolic artificial intelligence.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.3/LIPIcs.ICDT.2024.3.pdf
ontologies
tuple-generating dependencies
equality-generating dependencies
model theory
model-theoretic characterizations
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
4:1
4:20
10.4230/LIPIcs.ICDT.2024.4
article
Direct Access for Answers to Conjunctive Queries with Aggregation
Eldar, Idan
1
https://orcid.org/0009-0002-1664-8680
Carmeli, Nofar
2
https://orcid.org/0000-0003-0673-5510
Kimelfeld, Benny
1
https://orcid.org/0000-0002-7156-1572
Technion - Israel Institute of Technology, Haifa, Israel
Inria, LIRMM, Univ Montpellier, CNRS, France
We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For some common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. Specifically, we investigate the ability to evaluate such queries by constructing in log-linear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than log-linear in the size of the input, and so, the data structure needs to provide a compact representation of the space of answers.
In the absence of aggregation and annotation, past research provides a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself is not part of the lexicographic order. On the other hand, we show infeasibility for the case of count-distinct that does not have any efficient representation as a commutative semiring. We then investigate the ability to include the aggregate and annotation outcome in the lexicographic order. Among the hardness results, standing out as tractable is the case of a semiring with an idempotent addition, such as those of min and max. Notably, this case captures also count-distinct over a logarithmic-size domain.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.4/LIPIcs.ICDT.2024.4.pdf
aggregate queries
conjunctive queries
provenance semirings
commutative semirings
annotated databases
direct access
ranking function
answer orderings
query classification
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
5:1
5:19
10.4230/LIPIcs.ICDT.2024.5
article
Communication Cost of Joins over Federated Data
Cucumides, Tamara
1
Reutter, Juan
2
University of Antwerp, Belgium
PUC Chile & IMFD Chile, Santiago, Chile
We study the problem of querying different data sources, which we assume out of our control and that are made available by standard web communication protocols. In this scenario, the time spent communicating data often dominates the time spent processing local queries in each server. Thus, our focus is on algorithms that minimize the communication between the query processing server and the federated servers containing data.
However, any federated query can always be answered with linear communication, simply by requesting all the data to the federated sources. Further, one can show that certain queries do require this amount of communication. But sending all the data is definitely not a relevant algorithm from a practical point of view. This worst-case analysis is, therefore, not useful for our needs. There is a growing body of work in terms of designing strategies that minimize communication in query federation, but these strategies are commonly based in heuristics, and we currently miss a formal analysis providing guidelines for the design of such strategies.
We focus on the communication complexity of federated joins when the problem is parameterized by a measure commonly referred to as the certificate of the instance: a framework that has been used before in the context of set intersection and local query processing. We show how to process any conjunctive query in time given by the certificate of instances. Our algorithm is an adaptation of Minesweeper, one of the algorithms devised for local query processing, into our federating setting. When certificates are of the size of the instance, this amount to sending the entire database, but our strategy provides drastic reductions in the communication needed for queries and instances with small certificates. We also show matching communication lower bounds for cases where the certificate is smaller than the size of active domain of the instances.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.5/LIPIcs.ICDT.2024.5.pdf
databases
database queries
query federation
communication complexity
adaptive algorithms
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
6:1
6:21
10.4230/LIPIcs.ICDT.2024.6
article
Range Entropy Queries and Partitioning
Krishnan, Sanjay
1
https://orcid.org/0000-0001-6968-4090
Sintos, Stavros
2
https://orcid.org/0000-0002-2114-8886
Department of Computer Science, University of Chicago, IL, USA
Department of Computer Science, University of Illinois at Chicago, IL, USA
Data partitioning that maximizes or minimizes Shannon entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to find the entropy in different subsets of data when the algorithm needs to decide what block to construct. While it is generally known how to compute the entropy of a discrete distribution efficiently, we want to efficiently derive the entropy among the data items that lie in a specific area. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set P of n weighted and colored points in ℝ^d. The goal is to construct a low space data structure, such that given a query (hyper)rectangle R, it computes the entropy based on the colors of the points in P∩ R, in sublinear time. We show a conditional lower bound for this problem proving that we cannot hope for data structures with near-linear space and near-constant query time. Then, we propose exact data structures for d = 1 and d > 1 with o(n^{2d}) space and o(n) query time. We also provide a tune parameter t that the user can choose to bound the asymptotic space and query time of the new data structures. Next, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the entropy. Finally, we show how we can use the new data structures to efficiently partition time series and histograms with respect to entropy.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.6/LIPIcs.ICDT.2024.6.pdf
Shannon entropy
range query
data structure
data partitioning
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
7:1
7:18
10.4230/LIPIcs.ICDT.2024.7
article
Skyline Operators for Document Spanners
Amarilli, Antoine
1
https://orcid.org/0000-0002-7977-4441
Kimelfeld, Benny
2
https://orcid.org/0000-0002-7156-1572
Labbé, Sébastien
3
Mengel, Stefan
4
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Technion - Israel Institute of Technology, Haifa, Israel
École normale supérieure, Paris, France
Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), France
When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor.
To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.7/LIPIcs.ICDT.2024.7.pdf
Information Extraction
Document Spanners
Query Evaluation
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
8:1
8:20
10.4230/LIPIcs.ICDT.2024.8
article
When Do Homomorphism Counts Help in Query Algorithms?
ten Cate, Balder
1
https://orcid.org/0000-0002-2538-5846
Dalmau, Victor
2
https://orcid.org/0000-0002-9365-7372
Kolaitis, Phokion G.
3
4
https://orcid.org/0000-0002-8407-8563
Wu, Wei-Lin
3
https://orcid.org/0009-0004-3341-1508
University of Amsterdam, The Netherlands
Universitat Pompeu Fabra, Barcelona, Spain
University of California Santa Cruz, CA, USA
IBM Almaden Research Center, San Jose, CA, USA
A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.8/LIPIcs.ICDT.2024.8.pdf
query algorithms
homomorphism
homomorphism counts
conjunctive query
constraint satisfaction
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
9:1
9:19
10.4230/LIPIcs.ICDT.2024.9
article
Approximating Single-Source Personalized PageRank with Absolute Error Guarantees
Wei, Zhewei
1
https://orcid.org/0000-0003-3620-5086
Wen, Ji-Rong
1
https://orcid.org/0000-0002-9777-9676
Yang, Mingji
1
https://orcid.org/0000-0002-7748-2138
Renmin University of China, Beijing, China
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes s and t on a graph G = (V,E), the PPR value π(s,t) is defined as the probability that an α-discounted random walk from s terminates at t, where the walk terminates with probability α at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node s to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates π̂(s,t) satisfy max_{t ∈ V} |π̂(s,t)-π(s,t)| ≤ ε for a given error bound ε. We propose an algorithm that achieves this with high probability, with an expected running time of
- Õ(√m/ε) for directed graphs, where m = |E|;
- Õ(√{d_max}/ε) for undirected graphs, where d_max is the maximum node degree in the graph;
- Õ(n^{γ-1/2}/ε) for power-law graphs, where n = |V| and γ ∈ (1/2,1) is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring max_{t ∈ V} |π̂(s,t)/d(t)-π(s,t)/d(t)| ≤ ε_d for a given error bound ε_d, where the graph is undirected and d(t) is the degree of node t. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of Õ(√{∑_{t ∈ V} π(s,t)/d(t)}/ε_d). This improves over the previously known O(1/ε_d) complexity.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.9/LIPIcs.ICDT.2024.9.pdf
Graph Algorithms
Sublinear Algorithms
Personalized PageRank
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
10:1
10:20
10.4230/LIPIcs.ICDT.2024.10
article
Right-Adjoints for Datalog Programs
ten Cate, Balder
1
https://orcid.org/0000-0002-2538-5846
Dalmau, Víctor
2
https://orcid.org/0000-0002-9365-7372
Opršal, Jakub
3
https://orcid.org/0000-0003-1245-3456
Institute for Logic, Language, and Computation, University of Amsterdam, The Netherlands
Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
School of Computer Science, University of Birmingham, UK
A Datalog program can be viewed as a syntactic specification of a mapping from database instances over some schema to database instances over another schema. We establish a large class of Datalog programs for which this mapping admits a (generalized) right-adjoint. We employ these results to obtain new insights into the existence of, and methods for constructing, homomorphism dualities within restricted classes of instances. From this, we derive new results regarding the existence of uniquely characterizing data examples for database queries in the presence of integrity constraints.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.10/LIPIcs.ICDT.2024.10.pdf
Datalog
Adjoints
Homomorphism Dualities
Database Constraints
Conjunctive Queries
Data Examples
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
11:1
11:20
10.4230/LIPIcs.ICDT.2024.11
article
On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings
Im, Sungjin
1
Moseley, Benjamin
2
Ngo, Hung
3
Pruhs, Kirk
4
University of California, Merced, CA, USA
Carnegie Mellon University, Pittsburgh, PA, USA
RelationalAi, Berkeley, CA, USA
University of Pittsburgh, Pittsburgh, PA, USA
Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible.
It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms.
This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.11/LIPIcs.ICDT.2024.11.pdf
Datalog
convergence rate
semiring
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
12:1
12:20
10.4230/LIPIcs.ICDT.2024.12
article
Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility
Muñoz Serrano, Thomas
1
https://orcid.org/0000-0003-0004-5041
Riveros, Cristian
2
3
https://orcid.org/0000-0003-0832-116X
Vansummeren, Stijn
1
https://orcid.org/0000-0001-7793-9049
UHasselt, Data Science Institute, Diepenbeek, Belgium
Pontificia Universidad Católica de Chile, Santiago, Chile
Millennium Institute for Foundational Research on Data, Santiago, Chile
Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.12/LIPIcs.ICDT.2024.12.pdf
Query evaluation
conjunctive queries
linear algebra
enumeration algorithms
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
13:1
13:20
10.4230/LIPIcs.ICDT.2024.13
article
Direct Access for Conjunctive Queries with Negations
Capelli, Florent
1
https://orcid.org/0000-0002-2842-8223
Irwin, Oliver
2
https://orcid.org/0000-0002-8986-1506
Univ. Artois, CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
Université de Lille, CNRS, Inria, UMR 9189 - CRIStAL, F-59000 Lille, France
Given a conjunctive query Q and a database 𝐃, a direct access to the answers of Q over 𝐃 is the operation of returning, given an index j, the j-th answer for some order on its answers. While this problem is #P-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries - that is, queries having only negated atoms. In particular, we show that the class of β-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.13/LIPIcs.ICDT.2024.13.pdf
Conjunctive queries
factorized databases
direct access
hypertree decomposition
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
14:1
14:17
10.4230/LIPIcs.ICDT.2024.14
article
The Importance of Parameters in Database Queries
Grohe, Martin
1
https://orcid.org/0000-0002-0292-9142
Kimelfeld, Benny
2
https://orcid.org/0000-0002-7156-1572
Lindner, Peter
3
https://orcid.org/0000-0003-2041-7201
Standke, Christoph
1
https://orcid.org/0000-0002-3034-730X
RWTH Aachen University, Germany
Technion - Israel Institute of Technology, Haifa, Israel
École Polytechnique Fédérale de Lausanne, Switzerland
We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter.
The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.14/LIPIcs.ICDT.2024.14.pdf
SHAP score
query parameters
Shapley value
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
15:1
15:20
10.4230/LIPIcs.ICDT.2024.15
article
Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability
Amarilli, Antoine
1
https://orcid.org/0000-0002-7977-4441
van Bremen, Timothy
2
https://orcid.org/0009-0004-0538-3044
Meel, Kuldeep S.
3
https://orcid.org/0000-0001-9423-5270
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
National University of Singapore, Singapore
University of Toronto, Canada
Query evaluation over probabilistic databases is a notoriously intractable problem - not only in combined complexity, but for many natural queries in data complexity as well [Antoine Amarilli et al., 2017; Nilesh N. Dalvi and Dan Suciu, 2012]. This motivates the study of probabilistic query evaluation through the lens of approximation algorithms, and particularly of combined FPRASes, whose runtime is polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, which can be equivalently viewed as probabilistic graphs. We study in which cases we can devise combined FPRASes for probabilistic query evaluation in this setting.
We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability and (conditional) inapproximability results. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of [Marcelo Arenas et al., 2021] on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now [Rico Zenklusen and Marco Laumanns, 2011]. We also show that one cannot extend a recent result [Timothy van Bremen and Kuldeep S. Meel, 2023] that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. Finally, we complement all our inapproximability results with unconditional lower bounds, showing that DNNF provenance circuits must have at least moderately exponential size in combined complexity.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.15/LIPIcs.ICDT.2024.15.pdf
Probabilistic query evaluation
tuple-independent databases
approximation
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
16:1
16:17
10.4230/LIPIcs.ICDT.2024.16
article
Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity
Chen, Hubie
1
Mengel, Stefan
2
Department of Informatics, King’s College London, UK
Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), France
A central computational task in database theory, finite model theory, and computer science at large is the evaluation of a first-order sentence on a finite structure. In the context of this task, the width of a sentence, defined as the maximum number of free variables over all subformulas, has been established as a crucial measure, where minimizing width of a sentence (while retaining logical equivalence) is considered highly desirable. An undecidability result rules out the possibility of an algorithm that, given a first-order sentence, returns a logically equivalent sentence of minimum width; this result motivates the study of width minimization via syntactic rewriting rules, which is this article’s focus. For a number of common rewriting rules (which are known to preserve logical equivalence), including rules that allow for the movement of quantifiers, we present an algorithm that, given a positive first-order sentence ϕ, outputs the minimum-width sentence obtainable from ϕ via application of these rules. We thus obtain a complete algorithmic understanding of width minimization up to the studied rules; this result is the first one - of which we are aware - that establishes this type of understanding in such a general setting. Our result builds on the theory of term rewriting and establishes an interface among this theory, query evaluation, and structural decomposition theory.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.16/LIPIcs.ICDT.2024.16.pdf
width
query rewriting
structural decomposition
term rewriting
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
17:1
17:19
10.4230/LIPIcs.ICDT.2024.17
article
Containment of Regular Path Queries Under Path Constraints
Salvati, Sylvain
1
https://orcid.org/0000-0002-6230-0098
Tison, Sophie
1
https://orcid.org/0000-0002-8426-6230
Université de Lille, INRIA, CRIStAL/CNRS UMR 9189, France
Data integrity is ensured by expressing constraints it should satisfy. One can also view constraints as data properties and take advantage of them for several tasks such as reasoning about data or accelerating query processing. In the context of graph databases, simple constraints can be expressed by means of path constraints while simple queries are modeled as regular path queries (RPQs). In this paper, we investigate the containment of RPQs under path constraints. We focus on word constraints that can be viewed as tuple-generating dependencies (TGDs) of the form
∀x_1,x_2, ∃y⁻, a_1(x_1,y_1) ∧ ... ∧ a_i(y_{i-1},y_i) ∧ ... ∧ a_n(y_{n-1},x_2) ⟶
∃z⁻, b_1(x_1,z_1) ∧ ... ∧ b_i(z_{i-1},z_i) ∧ ... ∧ b_m(z_{m-1},x_2).
Such a constraint means that whenever two nodes in a graph are connected by a path labeled a_1 … a_n, there is also a path labeled b_1 … b_m that connects them. Rewrite systems offer an abstract view of these TGDs: the rewrite rule a_1 … a_n → b_1 … b_m represents the previous constraint. A set of constraints 𝒞 is then represented by a rewrite system R and, when dealing with possibly infinite databases, a path query p is contained in a path query q under the constraints 𝒞 iff p rewrites to q with R. Contrary to what has been claimed in the literature we show that, when restricting to finite databases only, there are cases where a path query p is contained in a path query q under the constraints 𝒞 while p does not rewrite to q with R. More generally, we study the finite controllability of the containment of RPQs under word constraints, that is when this containment problem on unrestricted databases does coincide with the finite case. We give an exact characterisation of the cases where this equivalence holds. We then deduce the undecidability of the containment problem in the finite case even when RPQs are restricted to word queries. We prove several properties related to finite controllability, and in particular that it is undecidable. We also exhibit some classes of word constraints that ensure the finite controllability and the decidability of the containment problem.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.17/LIPIcs.ICDT.2024.17.pdf
Graph databases
rational path queries
query containment
TGDs
word constraints
rewrite systems
finite controllability
decision problems
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
18:1
18:20
10.4230/LIPIcs.ICDT.2024.18
article
Computing Data Distribution from Query Selectivities
Agarwal, Pankaj K.
1
https://orcid.org/0000-0002-9439-181X
Raychaudhury, Rahul
1
Sintos, Stavros
2
https://orcid.org/0000-0002-2114-8886
Yang, Jun
1
https://orcid.org/0000-0003-0604-6790
Department of Computer Science, Duke University, Durham, NC, USA
Department of Computer Science, University of Illinois at Chicago, IL, USA
We are given a set 𝒵 = {(R_1,s_1), …, (R_n,s_n)}, where each R_i is a range in ℝ^d, such as rectangle or ball, and s_i ∈ [0,1] denotes its selectivity. The goal is to compute a small-size discrete data distribution 𝒟 = {(q₁,w₁),…, (q_m,w_m)}, where q_j ∈ ℝ^d and w_j ∈ [0,1] for each 1 ≤ j ≤ m, and ∑_{1≤j≤m} w_j = 1, such that 𝒟 is the most consistent with 𝒵, i.e., err_p(𝒟,𝒵) = 1/n ∑_{i = 1}ⁿ |s_i - ∑_{j=1}^m w_j⋅1(q_j ∈ R_i)|^p is minimized. In a database setting, 𝒵 corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and 𝒟 can be used as compact model for approximating the data distribution within the table without accessing the underlying contents.
In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is NP-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time O((n+δ^{-d}) δ^{-2} polylog n), a discrete distribution 𝒟̃ of size O(δ^{-2}), such that err_p(𝒟̃,𝒵) ≤ min_𝒟 err_p(𝒟,𝒵)+δ (for p = 1,2,∞) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.18/LIPIcs.ICDT.2024.18.pdf
selectivity queries
discrete distributions
Multiplicative Weights Update
eps-approximation
learnable functions
depth problem
arrangement
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
19:1
19:20
10.4230/LIPIcs.ICDT.2024.19
article
Information Inequality Problem over Set Functions
Hannula, Miika
1
https://orcid.org/0000-0002-9637-6664
University of Helsinki, Finland
Information inequalities appear in many database applications such as query output size bounds, query containment, and implication between data dependencies. Recently Khamis et al. [Mahmoud Abo Khamis et al., 2020] proposed to study the algorithmic aspects of information inequalities, including the information inequality problem: decide whether a linear inequality over entropies of random variables is valid. While the decidability of this problem is a major open question, applications often involve only inequalities that adhere to specific syntactic forms linked to useful semantic invariance properties. This paper studies the information inequality problem in different syntactic and semantic scenarios that arise from database applications. Focusing on the boundary between tractability and intractability, we show that the information inequality problem is coNP-complete if restricted to normal polymatroids, and in polynomial time if relaxed to monotone functions. We also examine syntactic restrictions related to query output size bounds, and provide an alternative proof, through monotone functions, for the polynomial-time computability of the entropic bound over simple sets of degree constraints.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.19/LIPIcs.ICDT.2024.19.pdf
entropy
information theory
worst-case output size
computational complexity
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
20:1
20:20
10.4230/LIPIcs.ICDT.2024.20
article
Conditional Independence on Semiring Relations
Hannula, Miika
1
https://orcid.org/0000-0002-9637-6664
University of Helsinki, Finland
Conditional independence plays a foundational role in database theory, probability theory, information theory, and graphical models. In databases, a notion similar to conditional independence, known as the (embedded) multivalued dependency, appears in database normalization. Many properties of conditional independence are shared across various domains, and to some extent these commonalities can be studied through a measure-theoretic approach. The present paper proposes an alternative approach via semiring relations, defined by extending database relations with tuple annotations from some commutative semiring. Integrating various interpretations of conditional independence in this context, we investigate how the choice of the underlying semiring impacts the corresponding axiomatic and decomposition properties. We specifically identify positivity and multiplicative cancellativity as the key semiring properties that enable extending results from the relational context to the broader semiring framework. Additionally, we explore the relationships between different conditional independence notions through model theory.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.20/LIPIcs.ICDT.2024.20.pdf
semiring
conditional independence
functional dependency
decomposition
axiom
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
21:1
21:20
10.4230/LIPIcs.ICDT.2024.21
article
Subgraph Enumeration in Optimal I/O Complexity
Deng, Shiyuan
1
Tao, Yufei
1
The Chinese University of Hong Kong, China
Given a massive data graph G = (V, E) and a small pattern graph Q, the goal of subgraph enumeration is to list all the subgraphs of G isomorphic to Q. In the external memory (EM) model, it is well-known that every indivisible algorithm must perform Ω({|E|^ρ}/{M^{ρ-1} B}) I/Os in the worst case, where M represents the number of words in (internal) memory, B denotes the number of words in a disk block, and ρ is the fractional edge covering number of Q. It has been a longstanding open problem to design an algorithm to match this lower bound. The state of the art is an algorithm in ICDT'23 that achieves an I/O complexity of O({|E|^ρ}/{M^{ρ-1} B} log_{M/B} |E|/B) with high probability. In this paper, we remove the log_{M/B} |E|/B factor, thereby settling the open problem when randomization is permitted.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.21/LIPIcs.ICDT.2024.21.pdf
Subgraph Enumeration
Conjunctive Queries
External Memory
Algorithms
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
22:1
22:20
10.4230/LIPIcs.ICDT.2024.22
article
Evaluating Graph Queries Using Semantic Treewidth
Feier, Cristina
1
Gogacz, Tomasz
1
Murlak, Filip
1
University of Warsaw, Poland
Unions of conjunctive two-way regular path queries (UC2RPQs) are a common abstraction of query languages for graph databases, much like unions of conjunctive queries (UCQs) in the relational case. As in the case of UCQs, their evaluation is NP-complete in combined complexity. Semantic tree-width, i.e. the minimal treewidth of equivalent queries, has been proposed as a candidate criterion to characterize fixed-parameter tractability of UC2RPQs. It was recently shown how to decide the semantic tree-width of a UC2RPQ, by constructing the best under-approximation of a given treewidth, in the form of a UC2RPQ of size doubly exponential in the size of the original query. This leads to an fpt algorithm for evaluating UC2RPQs of semantic TW k which runs in time doubly exponential in the size of the parameter, i.e. in the UC2RPQ. Here we describe a more efficient fpt algorithm for evaluating UC2RPQs of semantic treewidth k which runs in time singly exponential in the size of the parameter. We do this by a careful construction of a witness query which, while still being doubly exponential, can be represented as a Datalog program of bounded width and singly exponential size.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.22/LIPIcs.ICDT.2024.22.pdf
conjunctive two-way regular path queries
fixed-parameter tractable evaluation
semantic treewidth
Datalog encoding
optimization
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
23:1
23:20
10.4230/LIPIcs.ICDT.2024.23
article
Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling
Wang, Ru
1
Tao, Yufei
1
The Chinese University of Hong Kong, China
Given a (natural) join with an acyclic set of degree constraints (the join itself does not need to be acyclic), we show how to draw a uniformly random sample from the join result in O(polymat/max{1, OUT}) expected time (assuming data complexity) after a preprocessing phase of O(IN) expected time, where IN, OUT, and polymat are the join’s input size, output size, and polymatroid bound, respectively. This compares favorably with the state of the art (Deng et al. and Kim et al., both in PODS'23), which states that, in the absence of degree constraints, a uniformly random sample can be drawn in Õ(AGM/max{1, OUT}) expected time after a preprocessing phase of Õ(IN) expected time, where AGM is the join’s AGM bound and Õ(.) hides a polylog(IN) factor. Our algorithm applies to every join supported by the solutions of Deng et al. and Kim et al. Furthermore, since the polymatroid bound is at most the AGM bound, our performance guarantees are never worse, but can be considerably better, than those of Deng et al. and Kim et al.
We then utilize our techniques to tackle directed subgraph sampling, a problem that has extensive database applications and bears close relevance to joins. Let G = (V, E) be a directed data graph where each vertex has an out-degree at most λ, and let P be a directed pattern graph with a constant number of vertices. The objective is to uniformly sample an occurrence of P in G. The problem can be modeled as join sampling with input size IN = Θ(|E|) but, whenever P contains cycles, the converted join has cyclic degree constraints. We show that it is always possible to throw away certain degree constraints such that (i) the remaining constraints are acyclic and (ii) the new join has asymptotically the same polymatroid bound polymat as the old one. Combining this finding with our new join sampling solution yields an algorithm to sample from the original (cyclic) join (thereby yielding a uniformly random occurrence of P) in O(polymat/max{1, OUT}) expected time after O(|E|) expected-time preprocessing, where OUT is the number of occurrences.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.23/LIPIcs.ICDT.2024.23.pdf
Join Sampling
Subgraph Sampling
Degree Constraints
Polymatroid Bounds
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
24:1
24:20
10.4230/LIPIcs.ICDT.2024.24
article
Finding Smallest Witnesses for Conjunctive Queries
Hu, Xiao
1
https://orcid.org/0000-0002-7890-665X
Sintos, Stavros
2
https://orcid.org/0000-0002-2114-8886
University of Waterloo, Canada
University of Illinois Chicago, IL, USA
A witness is a sub-database that preserves the query results of the original database but of much smaller size. It has wide applications in query rewriting and debugging, query explanation, IoT analytics, multi-layer network routing, etc. In this paper, we study the smallest witness problem (SWP) for the class of conjunctive queries (CQs) without self-joins.
We first establish the dichotomy that SWP for a CQ can be computed in polynomial time if and only if it has head-cluster property, unless P = NP. We next turn to the approximated version by relaxing the size of a witness from being minimum. We surprisingly find that the head-domination property - that has been identified for the deletion propagation problem [Kimelfeld et al., 2012] - can also precisely capture the hardness of the approximated smallest witness problem. In polynomial time, SWP for any CQ with head-domination property can be approximated within a constant factor, while SWP for any CQ without such a property cannot be approximated within a logarithmic factor, unless P = NP.
We further explore efficient approximation algorithms for CQs without head-domination property: (1) we show a trivial algorithm which achieves a polynomially large approximation ratio for general CQs; (2) for any CQ with only one non-output attribute, such as star CQs, we show a greedy algorithm with a logarithmic approximation ratio; (3) for line CQs, which contain at least two non-output attributes, we relate SWP problem to the directed steiner forest problem, whose algorithms can be applied to line CQs directly. Meanwhile, we establish a much higher lower bound, exponentially larger than the logarithmic lower bound obtained above. It remains open to close the gap between the lower and upper bound of the approximated SWP for CQs without head-domination property.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.24/LIPIcs.ICDT.2024.24.pdf
conjunctive query
smallest witness
head-cluster
head-domination
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-14
290
25:1
25:18
10.4230/LIPIcs.ICDT.2024.25
article
Ranked Enumeration for MSO on Trees via Knowledge Compilation
Amarilli, Antoine
1
https://orcid.org/0000-0002-7977-4441
Bourhis, Pierre
2
https://orcid.org/0000-0001-5699-0320
Capelli, Florent
3
https://orcid.org/0000-0002-2842-8223
Monet, Mikaël
4
https://orcid.org/0000-0002-6158-4607
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
Univ. Artois, CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
Université de Lille, CNRS, Inria, UMR 9189 - CRIStAL, F-59000 Lille, France
We study the problem of enumerating the satisfying assignments for certain circuit classes from knowledge compilation, where assignments are ranked in a specific order. In particular, we show how this problem can be used to efficiently perform ranked enumeration of the answers to MSO queries over trees, with the order being given by a ranking function satisfying a subset-monotonicity property.
Assuming that the number of variables is constant, we show that we can enumerate the satisfying assignments in ranked order for so-called multivalued circuits that are smooth, decomposable, and in negation normal form (smooth multivalued DNNF). There is no preprocessing and the enumeration delay is linear in the size of the circuit times the number of values, plus a logarithmic term in the number of assignments produced so far. If we further assume that the circuit is deterministic (smooth multivalued d-DNNF), we can achieve linear-time preprocessing in the circuit, and the delay only features the logarithmic term.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol290-icdt2024/LIPIcs.ICDT.2024.25/LIPIcs.ICDT.2024.25.pdf
Enumeration
knowledge compilation
monadic second-order logic