Track B: Automata, Logic, Semantics, and Theory of Programming
Homogeneity and Homogenizability: Hard Problems for the Logic SNP

Authors: Jakub Rydval

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

The infinite-domain CSP dichotomy conjecture extends the finite-domain CSP dichotomy theorem to reducts of finitely bounded homogeneous structures. Every countable finitely bounded homogeneous structure is uniquely described by a universal first-order sentence up to isomorphism, and every reduct of such a structure by a sentence of the logic SNP. By Fraïssé’s Theorem, testing the existence of a finitely bounded homogeneous structure for a given universal first-order sentence is equivalent to testing the amalgamation property for the class of its finite models. The present paper motivates a complexity-theoretic view on the classification problem for finitely bounded homogeneous structures. We show that this meta-problem is EXPSPACE-hard or PSPACE-hard, depending on whether the input is specified by a universal sentence or a set of forbidden substructures. By relaxing the input to SNP sentences and the question to the existence of a structure with a finitely bounded homogeneous expansion, we obtain a different meta-problem, closely related to the question of homogenizability. We show that this second meta-problem is already undecidable, even if the input SNP sentence comes from the Datalog fragment and uses at most binary relation symbols. As a byproduct of our proof, we also get the undecidability of some other properties for Datalog programs, e.g., whether they can be rewritten in the logic MMSNP, whether they solve some finite-domain CSP, or whether they define a structure with a homogeneous Ramsey expansion in a finite relational signature.

Jakub Rydval. Homogeneity and Homogenizability: Hard Problems for the Logic SNP. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 150:1-150:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Current and Future Challenges in Knowledge Representation and Reasoning (Dagstuhl Perspectives Workshop 22282)

Authors: James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, and Frank Wolter

Published in: Dagstuhl Manifestos, Volume 10, Issue 1 (2024)

Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022,sser a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.

James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, and Frank Wolter. Current and Future Challenges in Knowledge Representation and Reasoning (Dagstuhl Perspectives Workshop 22282). In Dagstuhl Manifestos, Volume 10, Issue 1, pp. 1-61, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Grounding Stream Reasoning Research

Authors: Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

In the last decade, there has been a growing interest in applying AI technologies to implement complex data analytics over data streams. To this end, researchers in various fields have been organising a yearly event called the "Stream Reasoning Workshop" to share perspectives, challenges, and experiences around this topic. In this paper, the previous organisers of the workshops and other community members provide a summary of the main research results that have been discussed during the first six editions of the event. These results can be categorised into four main research areas: The first is concerned with the technological challenges related to handling large data streams. The second area aims at adapting and extending existing semantic technologies to data streams. The third and fourth areas focus on how to implement reasoning techniques, either considering deductive or inductive techniques, to extract new and valuable knowledge from the data in the stream. This summary is written not only to provide a crystallisation of the field, but also to point out distinctive traits of the stream reasoning community. Moreover, it also provides a foundation for future research by enumerating a list of use cases and open challenges, to stimulate others to join this exciting research area.

Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer. Grounding Stream Reasoning Research. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 2:1-2:47, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

When Do Homomorphism Counts Help in Query Algorithms?

Authors: Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.

Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu. When Do Homomorphism Counts Help in Query Algorithms?. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Right-Adjoints for Datalog Programs

Authors: Balder ten Cate, Víctor Dalmau, and Jakub Opršal

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

A Datalog program can be viewed as a syntactic specification of a mapping from database instances over some schema to database instances over another schema. We establish a large class of Datalog programs for which this mapping admits a (generalized) right-adjoint. We employ these results to obtain new insights into the existence of, and methods for constructing, homomorphism dualities within restricted classes of instances. From this, we derive new results regarding the existence of uniquely characterizing data examples for database queries in the presence of integrity constraints.

Balder ten Cate, Víctor Dalmau, and Jakub Opršal. Right-Adjoints for Datalog Programs. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Craig Interpolation for Decidable Fragments of First-Order Logic (Invited Talk)

Authors: Balder ten Cate

Published in: LIPIcs, Volume 288, 32nd EACSL Annual Conference on Computer Science Logic (CSL 2024)

The Craig Interpolation Property (CIP) is a property of logics. It states that, for all formulas φ and ψ, if φ ⊧ ψ, then there exists an "interpolant" ϑ such that φ ⊧ ϑ and ϑ ⊧ ψ, and such that all non-logical symbols occurring in ϑ occur both in φ and in ψ. Craig [Craig, 1957] proved in 1957 that first-order logic (FO) has this property. Since then, many refinements of Craig’s result have been obtained (e.g., [Otto, 2000; Benedikt et al., 2016]). These have found applications in various areas of computer science and AI, including formal verification, modular hard/software specification and automated deduction [McMillan, 2018; Calvanese et al., 2020; Hoder et al., 2012], and more recently prominently in databases [Toman and Weddell, 2011; Benedikt et al., 2016] and knowledge representation [Lutz and Wolter, 2011; ten Cate et al., 2013; Koopmann and Schmidt, 2015]. In this invited talk, we will survey recent work pertaining to Craig interpolation for various important decidable fragment of first-order logic, including guarded fragments, finite-variable fragments, and ordered fragments. Most of these fragments lack the CIP (the guarded-negation fragment GNFO being a notable exception [Bárány et al., 2013]). We will discuss strategies that have been proposed in recent literature to deal with this lack of CIP, as well as recent results that shed light on where, within the landscape of decidable fragment of first-order logic, one may find logics that enjoy CIP [Jung and Wolter, 2021; ten Cate and Comer, 2023].

Balder ten Cate. Craig Interpolation for Decidable Fragments of First-Order Logic (Invited Talk). In 32nd EACSL Annual Conference on Computer Science Logic (CSL 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 288, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Conjunctive Queries: Unique Characterizations and Exact Learnability

Authors: Balder ten Cate and Victor Dalmau

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.

Balder ten Cate and Victor Dalmau. Conjunctive Queries: Unique Characterizations and Exact Learnability. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 9:1-9:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Recursive Programs for Document Spanners

Authors: Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well-studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are regular expressions with capture variables. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (which extract relations that constitute the extensional database). This paper explores the expressive power of recursive Datalog over regex formulas. We show that such programs can express precisely the document spanners computable in polynomial time. We compare this expressiveness to known formalisms such as the closure of regex formulas under the relational algebra and string equality. Finally, we extend our study to a recently proposed framework that generalizes both the relational model and the document spanners.

Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive Programs for Document Spanners. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Declarative Probabilistic Programming with Datalog

Authors: Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena

Published in: LIPIcs, Volume 48, 19th International Conference on Database Theory (ICDT 2016)

Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.

Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

The Product Homomorphism Problem and Applications

Authors: Balder ten Cate and Victor Dalmau

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)

The product homomorphism problem (PHP) takes as input a finite collection of structures A_1, ..., A_n and a structure B, and asks if there is a homomorphism from the direct product between A_1, A_2, ..., and A_n, to B. We pinpoint the computational complexity of this problem. Our motivation stems from the fact that PHP naturally arises in different areas of database theory. In particular, it is equivalent to the problem of determining whether a relation is definable by a conjunctive query, and the existence of a schema mapping that fits a given collection of positive and negative data examples. We apply our results to obtain complexity bounds for these problems.

Balder ten Cate and Victor Dalmau. The Product Homomorphism Problem and Applications. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 161-176, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Querying and Reasoning Under Expressive Constraints (Dagstuhl Seminar 14331)

Authors: Michael Benedikt, Carsten Lutz, and Balder Ten Cate

Published in: Dagstuhl Reports, Volume 4, Issue 8 (2015)

This report documents the program and the outcomes of Dagstuhl Seminar 14331 "Querying and Reasoning Under Expressive Constraints" which took place from August 10th to August 14th, 2014. The seminar aimed to bring together researchers in databases, knowledge representation, decidable fragments of first-order logic, and constraint satisfaction to identify and discuss common themes and technique as well as complementary ones, identify future research issues, and foster cooperation and cross-fertilization between the communities.

Michael Benedikt, Carsten Lutz, and Balder Ten Cate. Querying and Reasoning Under Expressive Constraints (Dagstuhl Seminar 14331). In Dagstuhl Reports, Volume 4, Issue 8, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Schema Mappings and Data Examples: Deriving Syntax from Semantics (Invited Talk)

Authors: Phokion G. Kolaitis

Published in: LIPIcs, Volume 13, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011)

Schema mappings are high-level specifications that describe the relationship between two database schemas. Schema mappings are considered to be the essential building blocks in such critical data interoperability tasks as data exchange and data integration. For this reason, they have been the focus of extensive research investigations over the past several years. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for illustrating, explaining, and deriving schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand. In this talk, we present an overview of recent work on characterizing and deriving schema mappings via a finite set of data examples. We show that every LAV schema mapping (i.e., a schema mapping specified by a finite set of local-as-view tuple-generating dependencies) is uniquely characterized by a finite set of universal data examples with respect to the class of all LAV schema mappings. We also show that this type of result does not hold for arbitrary GAV schema mappings (i.e., schema mappings specified by a finite set of global-as-view tuple- generating dependencies). After this, we give a necessary and sufficient algorithmic condition for a GAV schema mapping to be uniquely characterizable by a finite set of universal examples with respect to the class of all GAV schema mappings. Along the way, we establish tight connections between unique characterizability of schema mappings and homomorphism dualities. This is joint work with Bogdan Alexe (IBM Research - Almaden), Balder ten Cate (UC Santa Cruz), and Wang-Chiew Tan (UC Santa Cruz and IBM Research - Almaden) based on [1, 2, 3].

Phokion G. Kolaitis. Schema Mappings and Data Examples: Deriving Syntax from Semantics (Invited Talk). In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 13, p. 25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Unary negation

Authors: Balder ten Cate and Luc Segoufin

Published in: LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)

We study fragments of first-order logic and of least fixed point logic that allow only unary negation: negation of formulas with at most one free variable. These logics generalize many interesting known formalisms, including modal logic and the mu-calculus, as well as conjunctive queries and monadic Datalog. We show that satisfiability and finite satisfiability are decidable for both fragments, and we pinpoint the complexity of satisfiability, finite satisfiability, and model checking. We also show that the unary negation fragment of first-order logic is model-theoretically very well behaved. In particular, it enjoys Craig interpolation and the Beth property.

Balder ten Cate and Luc Segoufin. Unary negation. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 344-355, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

