Tractable Conjunctive Queries over Static and Dynamic Relations
Abstract
We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes.
We characterise syntactically three classes of queries that admit constant update time and constant enumeration delay. We call such queries tractable. Depending on the class, the preprocessing time is linear, polynomial, or exponential (under data complexity, so the query size is constant).
To decide whether a query is tractable, it does not suffice to analyse separately the sub-queries over the static relations and over the dynamic relations, respectively. Instead, we need to take the interaction between the static and the dynamic relations into account. Even when the sub-query over the dynamic relations is not tractable, the overall query can become tractable if the dynamic relations are sufficiently constrained by the static ones.
Keywords and phrases:
fully dynamic algorithm, constant enumeration delay, constant update timeCopyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Database query processing and optimization (theory) ; Information systems Database views ; Information systems Data streamsAcknowledgements:
Olteanu would like to thank the IVM team at RelationalAI and in particular Niko Göbel for discussions that motivated this work. Luo’s contribution was done while he was with the University of Zurich.Funding:
The authors would like to acknowledge the UZH Global Strategy and Partnerships Funding Scheme and the EPSRC grant EP/T022124/1.Editors:
Sudeepa Roy and Ahmet KaraSeries and Publisher:

1 Introduction
Incremental view maintenance, also known as fully dynamic query evaluation, is a fundamental task in data management [22, 13, 10, 16, 28, 19, 29, 9]. A natural question is to understand which conjunctive queries are tractable, i.e., which queries admit constant time per single-tuple update (insert or delete) and also constant delay for the enumeration of the result tuples. The problem setting also allows for some one-off preprocessing phase to construct a data structure that supports the updates and the enumeration. Prior work showed that the -hierarchical queries are the conjunctive queries without repeating relation symbols that are tractable; such queries admit constant update time and enumeration delay, even if we only allow for linear time preprocessing [4]. All other conjunctive queries without repeating relation symbols cannot admit both constant update time and constant enumeration delay, even if we allow arbitrary time for preprocessing. If we only allow inserts (and no deletes), then every free-connex -acyclic conjunctive query becomes tractable [29, 21]. The tractable queries with free access patterns, where the free variables are partitioned into input and output, naturally extend the -hierarchical queries, which are queries without input variables [18]. Beyond this well-behaved query class, there are known intractable queries such as the triangle queries [15, 16] and hierarchical conjunctive queries with arbitrary free variables [17, 20] for which algorithms are known to achieve worst-case, yet not constant, optimal update time and enumeration delay. Restrictions to the data and the update sequence can turn intractable queries into tractable ones: for instance, when the update sequence has a bounded enclosureness values [29] or when the data satisfies functional dependencies [19], degree bounds [5], or more general integrity constraints [6].
All aforementioned works consider the all-dynamic setting, where all relations are updatable. In this work, we extend the tractability frontier by considering a mixed setting, where the input database can have both dynamic relations, which are subject to change, and static relations, which do not change. The mixed setting appears naturally in practice, e.g., in the retail domain where some relations (e.g., Inventory, Sales) are updated very frequently while others (e.g., Stores, Demographics) are only updated sporadically if at all. We show that by differentiating between static and dynamic relations, we can design efficient query maintenance strategies tailored to the mixed setting.
Main Contributions
We characterise syntactically three classes of tractable conjunctive queries depending on their preprocessing time, cf. Figure 1: . The queries are categorised based on their structure and on the structure of their static and dynamic sub-queries. These classes are defined in Section 4.
The class characterises the tractable queries with linear time preprocessing:
Theorem 1.
Let a conjunctive query and a database of size .
-
If is in , then it can be maintained with preprocessing time, update time, and enumeration delay.
-
If is not in and does not have repeating relation symbols, then it cannot be maintained with preprocessing time, update time, and enumeration delay, unless the Online Matrix-Vector Multiplication or the Boolean Matrix-Matrix Multiplication conjecture fail.
The upper bound in Theorem 1 relies on a rewriting of the query using strata of views, where the views are defined using projections or joins of input relations and possibly other views (Section 3). We call such rewritings safe if the views can be maintained in constant time for single-tuple updates and support constant-delay enumeration of the query result. We show that every query has a safe rewriting and the views can be computed in linear time (Section 5). The lower bound in Theorem 1 relies on two widely used conjectures: the Online Matrix-Vector Multiplication [12] and the Boolean Matrix-Matrix Multiplication [3]. The proof of the lower bound is outlined in Section 7.
Example 2.
In the all-static setting, becomes the class of free-connex -acyclic conjunctive queries, which are those conjunctive queries that admit constant enumeration delay after linear time preprocessing [3]. In the all-dynamic setting, becomes the class of -hierarchical conjunctive queries, which are those conjunctive queries that admit constant enumeration delay and constant update time after linear time preprocessing. Every query in is free-connex -acyclic and its dynamic sub-query, which is obtained by removing the static relations, is -hierarchical. Yet the queries in need to satisfy further syntactic constraints on the connections between their static and dynamic relations: For instance, and from Figure 2 are not in even though they are free-connex -acyclic and their dynamic sub-queries are -hierarchical.
The queries in are also tractable, yet they require super-linear preprocessing time, unless the Online Matrix-Vector Multiplication or the Boolean Matrix-Matrix Multiplication conjectures fail. We introduce a new width measure for conjunctive queries, called the preprocessing width, to characterise the preprocessing time of queries in :
Theorem 3.
Let a conjunctive query , the preprocessing width of , and a database of size . The query can be maintained with preprocessing time, update time, and enumeration delay.
Like the queries in , every query in also admits a safe rewriting (Section 5).
Example 4.
The query from Figure 2 is contained in : It is tractable but requires quadratic time preprocessing. The quadratic blowup is due to the creation of a view that is the join of the static relations and on the bound variable .
The preprocessing width is not captured by previous width notions, such as the fractional hypertree width () of either the static sub-query or of the entire query [23], as exemplified next. Consider the free-connex -acyclic query , whose is 1. Its static sub-query is the triangle join and has . The preprocessing width of is 1, so it is in . The triangle join has and its static sub-query has . Yet the preprocessing width of is 2: We materialise the static sub-query . We may reduce the preprocessing width to 3/2 for the static sub-query by also joining with the dynamic relation , yet the modified view becomes dynamic and needs to be maintained under updates to . However, this maintenance cannot be achieved with constant update time, while allowing for constant enumeration delay [16].
The class contains a broad set of tractable queries that may use up to exponential preprocessing time in the size of the input database:
Theorem 5.
Let a conjunctive query , the static sub-query of , the fractional edge cover number of , , and a database whose static relations have size . If is in , then it can be maintained with preprocessing time, update time, and enumeration delay.
The class is merely of theoretical interest since it comes with preprocessing time that is exponential in the size of the input database. The reason for this high preprocessing time is to ensure constant enumeration delay; if we would allow the enumeration delay to become linear, then the preprocessing time would collapse to linear for the -acyclic conjunctive queries in . The fractional edge cover number characterises the worst-case optimal result size of the static sub-query of [2].
Example 6.
The query from Figure 2 is in . This query can be maintained with constant update time and enumeration delay at the expense of exponential preprocessing time: The possible results of are any of the possible subsets of the static relation of size , while the updates to the dynamic relations only act as selectors in this powerset. The dynamic sub-query of , which is defined as the sub-query over the dynamic relations, is -hierarchical. The static sub-query of , which is defined as the sub-query over the only static relation, is trivially free-connex -acyclic. This means that, when taken in isolation, the dynamic sub-query can be evaluated with constant update time and enumeration delay after linear time preprocessing, while the static sub-query can be evaluated with constant enumeration delay after linear time preprocessing. Yet is not in : It does not admit constant update time and enumeration delay after linear time preprocessing. As we show in Section 8, the queries , , and from Figure 2 are not in , and we do not know whether they are tractable.
The queries in may not admit safe rewritings that rely solely on joins and projections. Take again from Figure 2. There is no safe rewriting of this query that solely relies on projections and joins. Any rewriting that supports constant-delay enumeration of the query result must contain a materialised view that either joins: (i) and ; (ii) and ; or (iii) and . None of these views can be maintained with constant update time. Consider the view . An insert of a value to requires to iterate over all -values paired with in in order to propagate the change to the view . The number of such -values can be linear. Hence, the update time is linear. Section 6 sketches our evaluation strategy for the queries in .
2 Preliminaries
For a natural number , we define . In case , we have .
Databases with Static and Dynamic Relations
Following standard terminology, a relation is a finite set of tuples and a database is a finite set of relations [1]. The size of a relation is the number of its tuples. The size of a database is given by the sum of the sizes of its relations. The relation is dynamic if it is subject to updates; otherwise, it is static. To emphasize that is static or dynamic, we write or respectively .
Conjunctive Queries
A conjunctive query (CQ or query for short) has the form
where , , and are relation symbols; are dynamic body atoms; are static body atoms; is the head atom; is the set of variables of ; is the set of free variables; is the set of bound variables; is the set of body atoms; is the set of body atoms with variable . The static (dynamic) sub-query () is obtained from as follows: Its body is the conjunction of all static (dynamic) atoms of and its free variables are the free variables of that appear in static (dynamic) atoms of . We visualise queries as hypergraphs where nodes are query variables (with the free variables underlined), solid red hyperedges represent dynamic atoms, and dotted blue hyperedges represent static atoms.
Example 7.
Consider the query and its hypergraph from Figure 2. Its static and dynamic sub-queries are and respectively .
A query is -acyclic if we can construct a tree, called join tree, such that: (1) the nodes of the tree are the body atoms of the query, and (2) for each variable, the following holds: if the variable appears in two body atoms, then it appears in all body atoms on the unique path between these two body atoms in the tree [8]. A query is free-connex -acyclic if it is -acyclic and remains -acyclic after adding the head atom to its body [7].
A query is hierarchical if for any two variables and , it holds that , , or [27]. A query is -hierarchical if it is hierarchical and for any two variables and with , it holds that: if is free, then is free [4].
A path in a query is a sequence of variables in such that each variable appears at most once in the sequence and each two adjacent variables and are contained in a body atom from , . We use set operations on paths as on sets of their variables. Two variables and are connected if there is a path . Two atoms and are connected if there are connected variables and . A set of body atoms in is a connected component of if any two atoms in are connected and this does not hold for any superset of .
Example 8.
The query from Figure 2 has the path that connects (i) the variables and and (ii) the atoms and .
Dynamic Query Evaluation
The problem of dynamic query evaluation is as follows: Given a query and a database with static and dynamic relations, the problem is to maintain the query result under updates to the dynamic relations and to enumerate the tuples in the query result.
A single-tuple update to a relation is an insert of a tuple into or a delete of a tuple from . We denote the insert of a tuple by and the delete of by . In this paper, we consider the set semantics, where a tuple is in or out of the database; in particular, we do not consider the setting, where each tuple is associated with a multiplicity that may be positive or negative as in prior work, e.g., [22, 16].
Following prior work, e.g., [4, 16, 19], we decompose the time complexity of dynamic query evaluation into preprocessing time, update time, and enumeration delay. The preprocessing time is the time to compute a data structure that represents the query result before receiving any update. The update time is the time to update the input database and data structure under a single-tuple insert or delete. The enumeration delay is the maximum of three times: the time between the start of the enumeration process and outputting the first tuple, the time between outputting any two consecutive tuples, and the time between outputting the last tuple and the end of the enumeration process [11].
Computation Model and Data Complexity
We consider the RAM model of computation. We assume that each relation is implemented by a data structure of size that can: (1) look up, insert, and delete entries in constant time, and (2) enumerate all stored entries in with constant delay. For a set , where is the set of attributes of R, we use an index data structure that, for any tuple over the attributes in , can: (3) enumerate all tuples in with constant delay, and (4) insert and delete index entries in constant time. We further require the following assumption in Section 6: Given a relation , whose size is polynomial in the input database size , we can lookup for any tuple in in constant time.
We report time complexities as functions of the database size under data complexity (so the query is fixed and has constant size). Constant update time and constant delay therefore mean that they do not depend on the database size.
3 Safe Query Rewriting
A rewriting of a query is a project-join plan for the query. In the context of dynamic query evaluation, query rewritings have been previously used under the term view trees due to their natural tree-shaped graphical depiction [19, 20]. In this paper, we use so-called safe query rewritings, which ensure tractable dynamic evaluation.
A rewriting of a conjunctive query using views (simply, rewriting of ) is a forest of trees where the leaves are the body atoms of and each inner node is a view such that:
-
If has a single child, then is a projection of the child onto some variables; we call a projection view.
-
If has several children, then is the join of its children; we call a join view.
A view is dynamic if the subtree rooted at contains a dynamic atom. For convenience, we also refer to the atoms in a rewriting as views.
Example 9.
Next, we define safe query rewritings, which have four properties. The two correctness properties ensure that the views correctly encode the query result as a factorised representation [25]. The update property guarantees that the dynamic views can be maintained in constant time under single-tuple updates to any dynamic relation. The enumeration property ensures that the tuples in the query result can be listed from the views with constant delay.
Definition 10 (Safe Query Rewriting).
A rewriting is safe for a conjunctive query if:
- Correctness
-
(1) For each connected component of , there is a tree that contains all atoms in . (2) For any projection view with child view , it holds that each atom of containing a variable from is contained only in the subtree rooted at .
- Update
-
The set of free variables of each dynamic view includes the set of free variables of each of its sibling views.
- Enumeration
-
Each tree has a connected set of views in that contains the root of such that
The materialisation time for a query rewriting is the time to materialise all its views (including both static and dynamic views).
Proposition 11.
Let a conjunctive query and a database of size . If has a safe rewriting with materialisation time for some function , then can be evaluated with preprocessing time, update time, and enumeration delay.
Example 12.
The first rewriting in Figure 3 is not safe: It violates the enumeration property because the root view contains the bound variable ; thus there is no guarantee of reporting distinct -values with constant delay. One possibility is to project away before starting the enumeration but this requires time linear in the size of . The rewriting also violates the update property: for instance, the set of free variables of the dynamic view does not include the set of free variables of its sibling , thus computing the change in for an update to requires iterating over linearly many -values from that are paired with the -value from the update.
The second rewriting is also not safe: It satisfies the enumeration property as the root view encodes the query result but fails on the update property for both dynamic atoms.
The third rewriting is safe and admits update time and enumeration delay, per Proposition 11. We next show how to handle updates and enumerate from this rewriting.
We can propagate an update to or to their ancestor views in constant time. Consider an insert of a new tuple to relation . To compute the change in , we check if exists in via a constant-time lookup. If not, we stop as no further propagation is needed; otherwise, we insert into in constant time. We maintain by inserting into that view. We compute the change in the root by checking if exists in via a constant-time lookup, and if so, we insert into the root. Propagating an insert to requires a lookup in and an insert into , both in constant time. Deletes to and are handled analogously. Thus, updates in this rewriting take constant time.
We can enumerate the distinct tuples in the query result using three nested for-loops. The first loop iterates over the -values in ; the second loop iterates over the -values in paired with a -value from the first loop; and the third loop iterates over the -values in paired with an -value from the second loop. Hence, each distinct result tuple can be constructed in constant time.
The time to compute the view is quadratic because in the worst case each -value in is paired with each -value in . All other views in the rewriting are either projection views or semi-joins of one child view with another child view. Thus, the overall computation time for the rewriting is , where is the database size.
4 New Query Classes
In this section we introduce the query classes , , and . We first define syntactic properties of the classes and that guarantee the existence of safe rewritings using views. The class contains queries that do not satisfy these properties.
Definition 13.
A path connecting two atoms and in a conjunctive query is safe if . In particular, the path is body-safe if the two atoms are body atoms and it is head-safe if one atom is a body atom and the other atom is the head atom. A conjunctive query is well-behaved if: (1) all paths connecting dynamic body atoms are body-safe; and (2) all paths connecting a dynamic body atom with the head atom are head-safe.
Example 14.
The queries – from Figure 2 are not well-behaved. The path in connects the two dynamic body atoms and , but it is not body-safe, since . The path in connects the two dynamic body atoms and , but the path is not body-safe, since . The path in connects the dynamic body atom with the head atom , but it is not head-safe, since . The path in connects the two dynamic body atoms and , but it is not body-safe, since .
We can check efficiently whether a query is well-behaved.
Proposition 15.
For any conjunctive query with variables and atoms, we can decide in time whether is well-behaved.
The well-behavedness property of a query implies the -hierarchical property of its dynamic sub-query.
Proposition 16.
-
Any conjunctive query without static relations is q-hierarchical if and only if it is well-behaved.
-
The dynamic sub-query of any well-behaved conjunctive query is q-hierarchical.
In Section 5, we show that every well-behaved query admits a safe rewriting, that is, it can be rewritten into a forest of view trees that support constant update time and constant enumeration delay. To obtain linear preprocessing time, we further require that the query is free-connex -acyclic.
Definition 17 ().
A conjunctive query is in if it is free-connex -acyclic and well-behaved.
For the class , we skip the condition that the query is free-connex -acyclic.
Definition 18 ().
A conjunctive query is in if it is well-behaved.
Example 19.
Consider the queries from Figure 2. The query is in , since it is free-connex -acyclic and well-behaved. The query is well-behaved but not free-connex -acyclic, since adding its head atom to its body yields a cyclic query. Hence, is in . The queries - are not in , since they are not well-behaved, as explained in Example 14.
We now define our most permissive class of tractable queries:
Definition 20 ().
A conjunctive query is in if it is well-behaved or every variable occurring in a dynamic atom also occurs in a static atom.
Example 21.
The query in Figure 2 is not well-behaved but is in , since the variables of the two dynamic atoms and also occur in the static atom .
The queries – are not contained in , since they are not well-behaved (as explained in Example 14) and have variables in dynamic atoms that do not occur in static atoms: The variable in and does not appear in the only static atom , and the variables and in do not appear in the only static atom .
5 Evaluation of Queries in and
This section introduces our evaluation strategy for the and queries. Section 5.1 introduces variable orders, which guide the construction of safe rewritings for such queries, and the preprocessing width of a query based on its variable orders. Section 5.2 shows that the construction of a safe rewriting for queries in can be done in time, where is the size of the input database.
5.1 Variable Orders
We say that two variables in a query are dependent if they appear in the same body atom. A variable order (VO) for a query in is a forest of trees such that: (1) There is a one-to-one mapping between the nodes of and the variables in ; (2) the variables of each body atom in lie on a root-to-leaf path in [26, 25]. We denote by the set of variables in and by the subtree of rooted at . We associate each VO with a dependency function that maps each variable in to the subset of its ancestors on which the variables in depend: . We further extend a VO by adding each body atom of the query as the child of the lowest variable in the VO that belongs to the atom.
We adapt the notion of canonical VOs from the all-dynamic setting [20] to the setting with both static and dynamic relations. A VO is canonical if the set of variables of each dynamic body atom is the set of inner nodes of the root-to-leaf path that ends with . It is free-top if no bound variable is an ancestor of a free variable. It is well-structured if it is canonical and free-top. We denote by the set of well-structured VOs of .
Example 22.
The following proposition generalises the above example to all well-behaved queries:
Proposition 23.
Any well-behaved conjunctive query has a well-structured VO.
Next, we define the preprocessing width of a VO and a query in . For a variable in a VO , let denote the query, whose body is the conjunction of all atoms in and whose free variables are all its variables. The preprocessing width of is defined next using the fractional edge cover number111The fractional edge cover number is used to define an upper bound on both the size of the query result and the time to compute it (Appendix A). for such queries , whereas the preprocessing width of a query is the minimum over the preprocessing widths of its well-structured VOs:
Example 24.
Figure 4 depicts a well-structured VO for the query . We have and . Overall, the preprocessing width of is .
The preprocessing width of any query in is :
Proposition 25.
For any conjunctive query in , it holds .
5.2 Safe Rewriting of Queries in
We show that every query in has a safe rewriting using views. The time to compute the views is , where is the database size and is the preprocessing width of the query.
Prior work uses view trees to maintain queries in the all-dynamic setting [18, 19]. We adapt the view tree construction to the setting over both static and dynamic relations and show that for queries in , the resulting view trees are safe rewritings.
Rewrite(VO ) : rewriting using views | ||
---|---|---|
switch : | ||
1return | ||
2return | ||
3if and is atom return 4let with root view , 5let 6let join of 7let 8return |
Given a VO for a query in , the procedure Rewrite in Figure 5 rewrites using views following a top-down traversal of . Initially, the parameter VO is . If is a set of trees, the procedure creates a view tree for each tree in (Line 1). If is a single atom, the procedure returns the atom (Line 2). If is a single tree with root , it proceeds as follows. If has only one child and this child is an atom, then the procedure returns the atom (Line 3). Otherwise, has several child views. In this case, the procedure creates a join view with free variables . If has a parent node, the procedure also adds on top of the join view a projection view that projects away (Line 8).
Example 26.
Figure 4 shows a well-structured VO for the query and the view tree constructed from using the procedure Rewrite. Observe that we obtain the view tree from by replacing each variable either by a single projection view or by a join view and a projection view on top.
We can verify that the view tree satisfies all four properties of safe rewritings from Definition 10. The set of free variables of each dynamic view includes the set of free variables of each of its siblings; for instance, the sibling views and have the same set of free variables. The views and encode the query result.
The time to compute the view is quadratic in the database size. All other views only need linear time to compute semi-joins or project away a variable. Thus, the materialisation time for this rewriting is quadratic. By Proposition 11, this query thus admits constant update time and constant enumeration delay after quadratic preprocessing time.
Given a well-structured VO for a query in , the procedure Rewrite outputs a safe rewriting for whose materialisation time is a function of the preprocessing width of .
Proposition 27.
For any conjunctive query in , VO in , and database of size , is a safe rewriting for with materialisation time, where is the database size and is the preprocessing width of .
6 Evaluation of Queries in
In this section, we introduce our evaluation strategy for queries in , specifically targeting those in for which the approach from previous sections is not applicable. We begin by demonstrating our strategy with a simple query from this class.
Example 28.
Consider the query from Figure 2. It does not admit a safe rewriting, thus the evaluation strategy described in Section 5 cannot achieve constant update and constant enumeration delay. At a first glance, does not seem tractable as a single-tuple update can lead to linearly many changes to the query result. For example, an insert to may be paired with linearly many -values in . However, the result of is always a subset of the static relation . With additional preprocessing time, we can precompute the effect of each update: We can construct a transition system whose states are the possible subsets of and the updates to the dynamic relations may cause state transitions. In this system, each state enables the query result to be enumerated with constant delay, and transitioning between states occurs in constant time.
Each state corresponds to a database instance and the materialised query result for that instance. Since the static relations are the same across all states, the states only record the content of the query result and of the dynamic relations without any dangling tuples.
For clarity in this section, we interpret a database as a finite set of facts and apply standard set operations to databases. The static and dynamic parts of a database , denoted and , consist of the facts from the static and dynamic relations in , respectively, with . Before explaining how to construct a transition system, we first define maximum dynamic databases.
Definition 29.
Let a conjunctive query in and a database , where . For each dynamic body atom in , let be the result of the sub-query of whose body is the conjunction of the static body atoms of and whose free variables are . The maximum dynamic database for query and database is .
By the definition of , each variable from a dynamic atom appears in at least one static atom of . The sub-query identifies the complete set of -tuples that might contribute to the result of ; any other -tuples have no impact on the result of .
Example 30.
Consider the query and the database from Figure 6 (left). The maximum dynamic database is obtained by evaluating the queries and on . As a result, the maximum dynamic database for query and database consists of and .
We next establish the upper bound on the size of a maximum dynamic database.
Proposition 31.
For any conjunctive query in and database of size , the maximum dynamic database for query and database has size.
We next define the transition system for a query in and a database .
Definition 32.
Consider a conjunctive query in and a database . Let denote the maximum dynamic database for query and database .
A transition state for query and database is a pair , where and is the result of on . The set of transition states for query and database is: .
A transition system for query and database is a tuple , where is the set of transition states for and , is the initial state, is the set of single-tuple updates to , and is a transition function that determines the next state based on the current state and an update. The initial state pairs the database and the result of on .
We construct this transition system during the preprocessing step. For each single-tuple update to a dynamic relation, the transition system is used to move from the current state to another. If the update does not belong to the maximum dynamic database, it has no effect, and the system remains in the same state. The query result is then enumerated from the current state. Next, we demonstrate our evaluation strategy for the query .
Example 33.
Figure 6 shows the states and the transition system constructed for the query and the database . The maximum dynamic database for and , consisting of and , has 8 possible subsets. Each state corresponds to one of these subsets and the result of evaluated on that subset and the static relation .
The initial state , denoted by the blue arrow, corresponds to the database , which includes and , along with the associated empty query result.
The state transitions correspond to single-tuple updates (inserts or deletes) to the dynamic relations and . Only updates involving tuples from are relevant to the result of , specifically updates involving the value in and the values and in . All other updates have no effect on the query result because the updated values do not appear in . For simplicity, we show only the state transitions for relevant updates in Figure 6.
For the same query and a database of size , the size of is the total number of distinct - and -values in , thus . The query result at any state is a subset of and can be computed in time. The transition system captures all possible subsets of , thus it has states in total. The number of transitions from any state is bounded by when considering only updates relevant to the query result.
When constructing the transition system in this case, we create a global index for states, with a lookup time of . For each state, we also build a local index to support transitions to neighbouring states. The local index is constructed by performing a lookup in time in the global index for each state transition. With possible transitions, constructing the local index takes time. The local index is defined over states and provides constant-time lookups, see the computational model in Section 2.
The preprocessing time is given by the number of states times the creation time per state, so . The update time is constant since for each single-tuple update to a dynamic relation, we find in constant time the corresponding transition and move to the target state or ignore the update if it does not match any transition. We enumerate the tuples in the current state trivially with constant delay, as the query result is already materialised.
Consider now an arbitrary query from and a database of size . The constructed transition system consists of states, where , per Definition 32 and Proposition 31. For each state, the corresponding query result can be computed in time using a worst-case optimal join algorithm [24].
We next construct the transitions for each state. Each state has at most transitions to other states when considering only updates relevant to the query result. The global index over the exponentially many states takes time to construct and requires time proportional to for a lookup. For each state, it takes time quadratic in to construct a local index that comprises at most neighbouring states using the global index with lookup time proportional to . Overall, constructing the transition system takes time , where . This matches the complexity in Theorem 5.
7 Lower Bound for Queries Outside
In this section, we outline the proof of the lower bound in Theorem 1. The proof consists of two parts. In the first part, we give a lower bound on the complexity of evaluating the simple queries and . None of these queries is well-behaved and, therefore, not contained in : the first one has the path that connects the dynamic body atoms and but is not body-safe; the second one has the path that connects the dynamic body atom with the head atom but is not head-safe. In the second part of the proof, we show a lower bound on the complexity of evaluating arbitrary conjunctive queries that are not contained in and do not have repeating relation symbols. The argument is as follows. Consider a conjunctive query that does not have repeating relation symbols. By definition of , this means that (1) is not free-connex -acyclic, or (2) it has a path connecting two dynamic body atoms that is not body-safe, or (3) it has a path connecting a dynamic body atom with the head atom that is not head-safe. In Case (1), we cannot achieve constant-delay enumeration of the result after linear preprocessing time (even without processing any update), unless the Boolean Matrix Multiplication conjecture fails [3]. In Case (2), we reduce the evaluation of to the evaluation of . In Case (3), we reduce the evaluation of to the evaluation of . The latter two reductions transfer the lower bound for and to . In the following, we outline the lower bound proofs for and and defer further details to the technical report [14].
The lower bound for is conditioned on the Online Vector-Matrix-Vector Multiplication (OuMv) conjecture, which is implied by the Online Matrix-Vector Multiplication (OMv) conjecture [12] (Appendix A). The proof of the lower bound is inspired by prior work, which reduces the OuMv problem to the evaluation of the variant of where all relations are dynamic [4]. We explain the differences of our reduction to prior work in terms of construction and implication. The reduction in prior work starts with an empty database and encodes the matrix of the OuMv problem into the relation using updates. In our case, it is not possible to do this encoding using updates, since the relation is static. Instead, we do the encoding before the preprocessing stage of the evaluation algorithm for . Since the preprocessing procedure of the evaluation algorithm for is executed on a non-empty database, we need to put a bound on the preprocessing time. Hence, while prior work establishes a lower bound on the update time and enumeration delay regardless of the preprocessing time, our reduction implies a lower bound on the combination of preprocessing time, update time, and enumeration delay:
Proposition 34.
The conjunctive query cannot be evaluated with preprocessing time, update time, and enumeration delay for any , where is the database size, unless the OuMv conjecture fails.
Proof Sketch.
Assume that there is an algorithm that evaluates with update time and enumeration delay after preprocessing time, for some . Consider an matrix and pairs of -dimensional vectors that serve as input to the OuMv problem. We first encode into the relation in time , which leads to a database of size . Then, in each round , we encode the vectors and into and respectively using updates and trigger the enumeration procedure of to obtain from the result of . This takes time. After rounds, we use overall time. This means that we solve the OuMv problem in sub-cubic time, which contradicts the OuMv conjecture.
The reduction of the OMv problem to the evaluation of the query is similar to the above reduction. We encode the matrix into the relation before the preprocessing stage and encode each incoming vector into using updates.
8 Outlook: Tractability Beyond
This work explores the tractability of conjunctive queries over static and dynamic relations. The largest class of tractable queries put forward is . Yet a characterisation of all tractable queries remains open.
In the following, we discuss the evaluation of queries outside . Let a conjunctive query . The reduced dynamic sub-query of is obtained from by omitting all static atoms and all of their variables.222In contrast, the dynamic sub-query of as defined in Section 2 retains all variables that appear in at least one dynamic atom. An immediate observation is that queries whose reduced dynamic sub-query is not -hierarchical are not tractable. This is implied by the intractability of non--hierarchical queries in the all-dynamic setting [4]. One example query whose reduced sub-query is not -hierarchical is from Figure 2. Its reduced dynamic sub-query is . Any dynamic evaluation strategy for , where the variable is fixed to an arbitrary constant, translates into a dynamic evaluation strategy for . This means that any complexity lower bound for the dynamic evaluation of is also a lower bound for .
A question is whether all queries, whose reduced dynamic sub-queries are -hierarchical, are tractable. We discuss two such queries from Figure 2 that are not in : and . The evaluation strategy from Example 33 can be easily extended to . Similar to Example 33, we create a transition system where each state stores -values and assign each -value that is common to and to the state that stores the -values paired with in the result. Any update to or changes the assignment of at most one -value. At any time, we can enumerate the query result by iterating over the -values and enumerating for each of them the tuples in their corresponding state. So is tractable, albeit not in .
The query differs from in that the variable is bound. The above approach for does not allow for constant-delay enumeration of the result of since distinct -values might be assigned to distinct states that share -values. Filtering out duplicates can however incur a non-constant enumeration delay. Therefore, our approach for cannot evaluate tractably.
References
- [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
- [2] Albert Atserias, Martin Grohe, and Dániel Marx. Size Bounds and Query Plans for Relational Joins. SIAM J. Comput., 42(4):1737–1767, 2013. doi:10.1137/110859440.
- [3] Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On Acyclic Conjunctive Queries and Constant Delay Enumeration. In CSL, pages 208–222, 2007. doi:10.1007/978-3-540-74915-8_18.
- [4] Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering Conjunctive Queries Under Updates. In PODS, pages 303–318, 2017. doi:10.1145/3034786.3034789.
- [5] Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering FO+MOD Queries under Updates on Bounded Degree Databases. ACM Trans. Database Syst., 43(2):7:1–7:32, 2018. doi:10.1145/3232056.
- [6] Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering UCQs under Updates and in the Presence of Integrity Constraints. In ICDT, pages 8:1–8:19, 2018. doi:10.4230/LIPICS.ICDT.2018.8.
- [7] Johann Brault-Baron. De la pertinence de l’énumération: complexité en logiques propositionnelle et du premier ordre. PhD thesis, Université de Caen, 2013.
- [8] Johann Brault-Baron. Hypergraph Acyclicity Revisited. ACM Comput. Surv., 49(3):54:1–54:26, 2016. doi:10.1145/2983573.
- [9] Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. DBSP: Automatic Incremental View Maintenance for Rich Query Languages. Proc. VLDB Endow., 16(7):1601–1614, 2023. doi:10.14778/3587136.3587137.
- [10] Samir Datta, Raghav Kulkarni, Anish Mukherjee, Thomas Schwentick, and Thomas Zeume. Reachability is in DynFO. J. ACM, 65(5):33:1–33:24, 2018. doi:10.1145/3212685.
- [11] Arnaud Durand and Etienne Grandjean. First-order Queries on Structures of Bounded Degree are Computable with Constant Delay. ACM Trans. Comput. Logic, 8(4):21, 2007. doi:10.1145/1276920.1276923.
- [12] Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In STOC, pages 21–30, 2015. doi:10.1145/2746539.2746609.
- [13] Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. In SIGMOD, pages 1259–1274, 2017. doi:10.1145/3035918.3064027.
- [14] Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Tractable Conjunctive Queries over Static and Dynamic Relations. CoRR, arXiv:2404.16224, 2024. doi:10.48550/arXiv.2404.16224.
- [15] Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Counting Triangles under Updates in Worst-Case Optimal Time. In ICDT, pages 4:1–4:18, 2019. doi:10.4230/LIPICS.ICDT.2019.4.
- [16] Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Maintaining Triangle Queries under Updates. ACM Trans. Database Syst., 45(3):11:1–11:46, 2020. doi:10.1145/3396375.
- [17] Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries. In PODS, pages 375–392, 2020. doi:10.1145/3375395.3387646.
- [18] Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Conjunctive Queries with Free Access Patterns Under Updates. In ICDT, volume 255 of LIPIcs, pages 17:1–17:20, 2023. doi:10.4230/LIPICS.ICDT.2023.17.
- [19] Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. F-IVM: Analytics over Relational Databases under Updates. The VLDB Journal, 2023. doi:10.1007/S00778-023-00817-W.
- [20] Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries. Log. Methods Comput. Sci., 19(3), 2023. doi:10.46298/LMCS-19(3:11)2023.
- [21] Mahmoud Abo Khamis, Ahmet Kara, Dan Olteanu, and Dan Suciu. Insert-Only versus Insert-Delete in Dynamic Query Evaluation. Proc. ACM Manag. Data, 2(5):219:1–219:26, 2024. doi:10.1145/3695837.
- [22] Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. The VLDB Journal, 23(2):253–278, 2014. doi:10.1007/S00778-013-0348-4.
- [23] Dániel Marx. Approximating Fractional Hypertree Width. ACM Trans. Algorithms, 6(2):29:1–29:17, 2010. doi:10.1145/1721837.1721845.
- [24] Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case Optimal Join Algorithms. J. ACM, 65(3):16:1–16:40, 2018. doi:10.1145/3180143.
- [25] Dan Olteanu and Maximilian Schleich. Factorized Databases. SIGMOD Rec., 45(2):5–16, 2016. doi:10.1145/3003665.3003667.
- [26] Dan Olteanu and Jakub Závodný. Size Bounds for Factorised Representations of Query Results. ACM TODS, 40(1):2:1–2:44, 2015. doi:10.1145/2656335.
- [27] Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. doi:10.2200/S00362ED1V01Y201105DTM016.
- [28] Yufei Tao and Ke Yi. Intersection Joins under Updates. J. Comput. Syst. Sci., 124:41–64, 2022. doi:10.1016/J.JCSS.2021.09.004.
- [29] Qichen Wang, Xiao Hu, Binyang Dai, and Ke Yi. Change Propagation Without Joins. Proc. VLDB Endow., 16(5):1046–1058, 2023. doi:10.14778/3579075.3579080.
Appendix A Further Preliminaries
Definition 35 (Fractional Edge Cover [2]).
Given a conjunctive query and a set , a fractional edge cover of for is a solution to the following linear program:
minimize | ||||
subject to | ||||
The optimal objective value of the above program is called the fractional edge cover number of for the variable set and denoted as . We abbreviate by .
For a query without bound variables, a set , and a database of size , with is an upper bound on the worst-case size of the result of projected onto the variables in [2]. The result of can be computed in time [24].
In this work, we establish complexity lower bounds based on the following widely accepted complexity-theoretic conjectures.
Definition 36 (Online Matrix-Vector Multiplication (OMv) Problem [12]).
We are given an Boolean matrix and receive Boolean column vectors of size , one by one. After seeing each vector , the task is to output the multiplication before seeing the next vector.
Conjecture 37 (OMv Conjecture [12]).
For any constant , there is no algorithm that solves the OMv problem in time .
Definition 38 (Online Vector-Matrix-Vector Multiplication (OuMv) Problem [12]).
We are given an Boolean matrix and receive pairs of Boolean column vectors of size , one by one. After seeing each pair of vectors , the task is to output the multiplication before seeing the next pair.
The following OuMv conjecture is implied by the OMv conjecture.
Conjecture 39 (OuMv Conjecture [12]).
For any constant , there is no algorithm that solves OuMv problem in time .
Appendix B Proof of Proposition 11
Let be a safe rewriting for . The preprocessing time is the materialisation time for .
To enumerate the result of , we nest the enumeration procedures for the connected components of , concatenating their result tuples. For each connected component , the enumeration procedure traverses the tree containing all atoms from in a top-down manner. The enumeration property of guarantees that there exists a subtree of such that the root of is the same as the root of and the set of free variables of the views in consists of the free variables of .
We enumerate the result of by traversing the subtree in preorder. At each view , we fix the values of the variables in , where is the set of free variables of the ancestor views of . We retrieve in constant time a tuple of values over from for the given -value. After visiting all views once, we construct the first complete result tuple for and report it. We continue iterating over the remaining distinct values over in the last visited view , reporting new tuples with constant delay. After exhausting all values from , we backtrack and repeat the enumeration procedure for the next -value. The enumeration stops once all views from the subtree are exhausted. Given that all views are calibrated bottom-up but the enumeration proceeds top-down, the procedure only visits those tuples that appear in the result, thus ensuring constant enumeration delay.
We propagate a constant-sized update through a tree in a bottom-up manner, maintaining each view on the path from the affected relation to the root. From the update property of the safe rewriting , computing the delta of any join view involves only constant-time lookups in the sibling views of the child carrying the update. The size of the delta also remains constant. Computing the delta of a projection view also requires a constant-time projection of its incoming update. Since an update to one relation affects one tree of , propagating an update through takes constant time.
Appendix C Missing Details in Section 4
C.1 Proof of Proposition 15
We first explain how we can check that every path connecting two dynamic body atoms in is body-safe. For each pair of dynamic body atoms of , we first construct the Gaifman graph of the hypergraph of without the variables . The graph contains nodes and edges. We next choose any and and check in time if is reachable from using the Breadth-First Search algorithm; if so, has a path connecting and that is not body-safe. We repeat this procedure for every pair of dynamic body atoms, which gives the total cost of .
We now show how to check that every path connecting a dynamic body atom with the head atom is head-safe. For each dynamic body atom , we construct the Gaifman graph of without the free variables in . For each atom containing a free variable , we choose any and check if is reachable from in the graph; if so, has path connecting with the head atom that is not head-safe. The total time is .
C.2 Proof of Proposition 16
We start with the proof of the second statement in Proposition 16. Consider a well-behaved query . For the sake of contradiction, assume that is not q-hierarchical. This means that one of the following two cases holds: (1) is not hierarchical or (2) is hierarchical but not q-hierarchical. We show that both cases lead to a contradition.
Assume that is not hierarchical. In the following, we denote by the set of dynamic body atoms in that contain the variable . The query must have two variables and such that , , and . This implies that has three dynamic body atoms , , and such that , , , , , and . The path connects the two dynamic atoms and such that . This means that is not body-safe, which is a contradiction to our assumption that is well-behaved.
Assume now that is hierarchical, but not -hierarchical. This implies that contains two dynamic body atoms and , a bound variable with and , and a free variable with and . The path connects the dynamic body atom with the head atom of such that . This means that the path is not head-safe, which is again a contradiction.
Next, we prove the first statement in Proposition 16. The “if”-direction follows directly from the second statement of the proposition. It remains to show the “only-if”-direction. We prove the contraposition of this direction. Consider a query without static relations that is not well-behaved. We show that is not q-hierarchical.
Assume that has a path
that connects two body atoms and such that
, i.e.,
is not body-safe.
Assume that is the shortest such path.
Since is not body-safe, it must hold .
Furthermore, every pair with is contained in a body atom
i.e., .
The following claim implies that is not hierarchical, hence, not q-hierarchical:
Claim 1: such that and
.
We prove Claim 1. Assume that for each , it holds or . Consider an such that (the case for is treated analogously). If , let . Otherwise, let . The path is shorter than and still not body-safe. This contradicts our assumption on the length of .
Assume now that has a path
connecting a dynamic body atom with the head atom of
such that , i.e.,
is not head-safe. Let be the shortest such path.
Without loss of generality, we assume that is hierarchical.
Since is free,
the following claim implies that is not q-hierarchical:
Claim 2: is bound and .
We prove Claim 2. Assume that is free. Since is not head-safe, cannot be included in . Hence, is a path that is shorter than and not head-safe, which contradicts our assumption on the length of . Assume now that . Since is hierarchical and there must be at least one body atom including both and , it must hold . This implies that , since otherwise the free variable is included in , which contradicts our assumption that is not head-safe. This means however that the path is sorter than and not head-safe, which contradicts our assumption on the length of .