FO-Query Enumeration over SLP-Compressed Structures of Bounded Degree
Abstract
Enumerating the result set of a first-order query over a relational structure of bounded degree can be done with linear preprocessing and constant delay. In this work, we extend this result towards the compressed perspective where the structure is given in a potentially highly compressed form by a straight-line program (SLP). Our main result is an algorithm that enumerates the result set of a first-order query over a structure of bounded degree that is represented by an SLP satisfying the so-called apex condition. For a fixed formula, the enumeration algorithm has constant delay and needs a preprocessing time that is linear in the size of the SLP.
Keywords and phrases:
Enumeration algorithms, FO-logic, query evaluation over compressed dataFunding:
Markus L. Schmid: Supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) – project number 522576760 (gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 522576760).Copyright and License:
2012 ACM Subject Classification:
Theory of computation Database query processing and optimization (theory) ; Theory of computation Logic and databasesEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
First order model checking (i.e., deciding whether an FO-sentence holds in a relational structure , for short) is a classical problem in computer science and its complexity has been thoroughly investigated; see, e.g., [21, 32, 37]. In database theory, it is of importance due to its practical relevance for evaluating SQL-like query languages in relational databases. FO model checking is PSPACE-complete when and are both part of the input, but it becomes fixed-parameter tractable (even linear fixed-parameter tractable) with respect to the parameter when is restricted to a suitable class of relational structures (see the paragraph on related work below for details), while for the class of all structures it is not fixed-parameter tractable modulo certain complexity assumptions. This is relevant, since in practical scenarios queries are often small, especially in comparison to the data (represented by the relational structure) that is often huge.
FO model checking (i.e., checking a Boolean query that returns either true or false) reduces to practical query evaluation tasks and is therefore suitable to transfer lower bounds. However, from a practical point of view, FO-query enumeration is more relevant. It is the problem of enumerating without repetitions for an FO-formula with free variables the result set of all tuples such that . Since can be rather large (exponential in in general), the total time for enumeration is not a good measure for the performance of an enumeration algorithm. More realistic measures are the preprocessing time (used for performing some preprocessing on the input) and the delay, which is the maximum time needed between the production of two consecutive output tuples from . In data complexity (where we consider to be constant), the best we can hope for is linear preprocessing time (i.e., for a computable function ) and constant delay (i.e., the delay is for some computable function and therefore does not depend on ). Over the last two decades, many of the linear time (with respect to data complexity) FO model checking algorithms for various subclasses of structures have been extended to FO-query enumeration algorithms with linear (or quasi-linear) time preprocessing and constant delay (see the paragraph on related work below for the relevant literature).
In this work, we extend FO-query enumeration towards the compressed perspective, i.e., we wish to enumerate the result set in the scenario where is given in a potentially highly compressed form, and we want to work directly on this compressed form without decompressing it. In this regard, we contribute to a recent research effort in database theory that is concerned with query evaluation over compressed data [45, 51, 59, 60]. Let us now explain this framework in more detail.
Query evaluation over compressed data.
Query evaluation over compressed data combines the classical task of query evaluation with the paradigm of algorithmics on compressed data (ACD), i.e., solving computational tasks directly on compressed data objects without prior decompression. ACD is an established algorithmic paradigm and it works very well in the context of grammar-based compression with so-called straight-line programs (SLPs). Such SLPs use grammar-like formalisms in order to specify how a data object is constructed from small building blocks. For example, if the data object is a finite string , then an SLP is just a context-free grammar for the language . For instance, the SLP , , , (where are nonterminals and are terminals) produces the string . While SLPs achieve exponential compression in the best case, there are also fast heuristic compressors that yield decent compression in practical scenarios. Moreover, SLPs are very well suited for ACD; see, e.g., [38].
An important point is that the ACD perspective can lead to dramatic running time improvements: if the same problem can be solved in linear time both in the uncompressed and in the compressed setting (i.e., linear in the compressed size), then in the case that the input can be compressed from size to size (which is possible with SLPs in the best case), the algorithm for the compressed data has a running time of (compared to for the algorithm working on uncompressed data). An important problem that shows this behavior is for instance pattern matching in compressed texts [22].
SLPs are most famous for strings (see [5, 10, 22, 23] for some recent publications and [38] for a survey). What makes them particularly appealing for query evaluation is that their general approach of compressing data objects by grammars extends from strings to more complex structures like trees [24, 40, 42, 44] and hypergraphs (i.e., general relational structures) [36, 39, 46, 47], while, at the same time, their good ACD-properties are maintained to some extend. This is due to the fact that context-free string grammars extend to context-free tree grammars [58] (see also [25]) and to hyperedge replacement grammars [6, 27] (see also [16]).
In this work, we are concerned with FO-query enumeration for relational structures that are compressed by SLPs based on hyperedge replacement grammars (also known as hierarchical graph definitions or SL HR grammars; see the paragraph on related work for references). An example of such an SLP is shown in Figure 1. It consists of productions (shown in Figure 1 on the left) that replace nonterminals (, , and in Figure 1) by their unique right-hand sides. Each right-hand side is a relational structure (a directed graph in Figure 1) together with occurrences of earlier defined nonterminals and certain distinguished contact nodes (labelled by and in Figure 1). In this way, every nonterminal produces a relational structure (the value of ) with distinguished contact nodes. These structures are shown in Figure 1 on the right. When replacing for instance the occurrence of in the right hand side of by , one identifies for every the -labelled contact node in with the node that is connected by the -labelled dotted edge with the -occurrence in the right-hand side of (these are the nodes labelled with and in Figure 1).
Main result.
It is known that FO-query enumeration for degree-bounded structures can be done with linear preprocessing and constant delay [13, 28]. Moreover, FO model checking for SLP-compressed degree-bounded structures can be done efficiently [39]. We combine these two results and therefore extend FO-query enumeration for bounded-degree structures towards the SLP-compressed setting, or, in other words, we extend FO model checking of SLP-compressed structures to the query-enumeration perspective. A preliminary version of our main result is stated below. It restricts to so-called apex SLPs. Roughly speaking, the apex property demands that each graph replacing a nonterminal must not contain other nonterminals at the “contact nodes” (the nodes the nonterminal was incident with). The apex property is well known from graph language theory [16, 17, 18] and has been used for SLPs in [39, 49].
Theorem 1.
Let be a constant. For an FO-formula and a relational structure , whose Gaifman graph has degree at most , and that is given in compressed form by an apex SLP , one can enumerate the result set after preprocessing time and delay for some computable function .
Note that the preprocessing is linear in the compressed size instead of the data size .
We prove this result by extending the FO-query enumeration algorithm for uncompressed structures from [28] to the SLP-compressed setting. For this we have to overcome considerable technical barriers. The algorithm of [28] exploits the Gaifman locality of FO-queries. In the preprocessing phase the algorithm computes for each element the -sphere around for a radius that only depends on the formula . This leads to a preprocessing time of . For an SLP-compressed structure we cannot afford to iterate over all elements of the structure. Inspired by a technique from [39], we will expand every nonterminal of the SLP locally up to a size that depends only on and . This leads to at most substructures of size . Our enumeration algorithm then enumerates certain paths in the derivation tree defined by and for each such path ending in a nonterminal it searches in the precomputed local expansion of for nodes with a certain sphere type.
Related work.
In the uncompressed setting, there are several classes of relational structures for which FO-query enumeration can be solved with linear (or quasi-linear) preprocessing and constant delay, e.g., relational structures with bounded degree [8, 13, 28], low degree [14], (locally) bounded expansion [29, 64], and structures that are tree-like [3, 30] or nowhere dense [61]; see [7, 63] for surveys. Moreover, for conjunctive queries with certain acyclicity conditions, linear preprocessing and constant delay is also possible for the class of all relational structures [4, 7]. The algorithm from [28] is the most relevant one for our work.
Concerning other work on query enumeration on SLP-compressed data, we mention [51, 59, 60], which deals with constant delay enumeration for (a fragment of) MSO-queries on SLP-compressed strings, and [45], which presents a linear preprocessing and constant delay algorithm for MSO-queries on SLP-compressed unranked forests.
SLPs for (hyper)graphs were introduced as hierarchical graph descriptions by Lengauer and Wanke [36] and have been further studied, e.g., in [9, 19, 20, 34, 35, 49, 48, 50]. Model checking problems for SLP-compressed graphs have been studied in [39] for FO and MSO, [26] for fixpoint logics, and [1, 2] for the temporal logics LTL and CTL in the context of hierarchical state machines (which are a particular type of graph SLPs). Particularly relevant for this paper is a result from [39] stating that for every level of the polynomial time hierarchy there is a fixed FO-formula for which the model checking problem for SLP-compressed input graphs is -complete. In contrast, for apex SLPs the model checking problem for every fixed FO-formula belongs to NL (nondeterministic logspace) [39]. This (and the fact that FO-query enumeration reduces to FO model checking) partly explains the restriction to apex SLPs in Theorem 1.
Compression of graphs via graph SLPs has been considered in [57] following a “Sequitur scheme” [52] and in [47] following a “RePair scheme” [33] (see also [41]); note that both compressors produce graph SLPs that may not be apex.
Another recent concept in database theory that is concerned with compressed representations of relational data and query evaluation are factorized databases (see [31, 53, 54, 55, 56]). Intuitively speaking, in a factorized representation of a relational structure each relation is represented as an expression over the relational operators union and product that evaluates to . However, SLPs for relational structures and factorized representations cover completely different aspects of redundancy: A factorized representation is always at least as large as its active domain (i.e., all elements that occur in some tuple), while an SLP for a relational structure can be of logarithmic size in the size of the universe of the structure. On the other hand, small factorized representations do not seem to necessarily translate into small SLPs.
2 General Notations
Let . For every , we set . For a finite alphabet , we denote by the set of all finite strings over including the empty string . For a partial let (where stands for undefined) and . For functions and we define the composition by for all .
A partial -tuple over a set is a partial function . If , then we also say that is a complete -tuple or just a -tuple; in this case we also write in the conventional form . Two partial -tuples and are disjoint if . In this case, their union is the partial -tuple defined by if for and if .
2.1 Directed acyclic graphs
An ordered dag (directed acyclic graph) is a triple , where is a finite set of nodes, is the child-function, the relation is acyclic, and is the initial node. The size of is . A node with is called a leaf.
A path in (from to ) is a sequence such that for all . The length of this path is (we may have in which case ) and we also call a -to- path or -path if the end point is not important. An -path is also called an initial path. We extend this notation to subsets of in the obvious way, e.g., for and we talk about -to- paths, -to- paths, -to-leaf paths (where “leaf” refers to the set of all leaves of the dag), initial-to-leaf paths, etc. For a -to- path and a -to- path we define the -to- path (note that if we just concatenate and as words, then we have to replace the double occurrence by to obtain ). We say that the path is a prefix of the path if there is a path such that .
Since we consider ordered dags, there is a natural lexicographical ordering on all -paths (i.e., all paths that start in the same node ). More precisely, for two different -paths and we write if either is a proper prefix of or we can write and as , for paths and with .
2.2 Relational structures and first order logic
A signature is a finite set consisting of relation symbols () and constant symbols (). Each relation symbol has an associated arity . A structure over the signature is a tuple , where is a finite non-empty set (the universe of ), is the relation associated with the relation symbol , and is the constant associated with the constant symbol . Note that we restrict to finite structures. If the structure is clear from the context, we will identify (, respectively) with the relation symbol (the constant symbol , respectively). Sometimes, when we want to refer to the universe , we will refer to itself. For instance, we write for , or for a function . The size of is . As usual, a constant may be replaced by the unary relation . Thus, in the following, we will only consider signatures without constant symbols, except when we explicitly introduce constants. Let be such a signature (we call it a relational signature) and let be a structure over (we call it a relational structure). For relational structures and over the signature , we write to denote that and are isomorphic. A substructure of is a structure such that and for all . The substructure of induced by is . We define the undirected graph (the so-called Gaifman graph of ), where contains an edge if and only if there is a binary relation () and a tuple with . The degree of is the maximal degree of a node in . If has degree at most , we also say that is a degree- bounded structures.
We use first-order logic (FO) over finite relational structures; see [15] for a detailed introduction and the full version [43] for some standard notations. For an FO-formula over the signature with free variables and a relational structure over and , we write if is true in when the variable is set to for all . Hence, an FO-formula can be interpreted as an FO-query that, for a given structure , defines a result set
The quantifier rank of an FO-formula is inductively defined as follows: if contains no quantifiers, , and .
In the rest of the paper, we assume that the signature only contains relation symbols of arity at most two. It is folklore that FO model checking and FO-query enumeration over arbitrary signatures can be reduced to this case; see the full version [43] for a possible construction. This construction can be carried out in linear time (in the size of the formula and the structure) and it increase the degree of the structure as well as the quantifier rank of the formula by at most one.
2.3 Distances, spheres and neighborhoods
Let us fix a relational signature (containing only relation symbols of arity at most two) and let be a structure over this signature. We say that is connected, if its Gaifman graph is connected. The distance between elements in the graph is denoted by (it can be ). For subsets we define . For two partial tuples (of any arity) over let .
Fix a constant . We will only consider degree- bounded structures in the following. Let us fix such a structure (over the relational signature ). Take additional constant symbols called sphere center constants. For an and a partial -tuple we define the -sphere . The -neighborhood of is obtained by taking the substructure of induced by and then adding every node () as the interpretation of the sphere center constant . Hence, it is a structure over the signature . The -neighborhood of a -tuple has at most elements (here, the inequality holds since we assume ).
We use the above definitions also for a single element in place of a tuple ; formally is identified with the -tuple such that . We are mainly interested in -spheres and -neighborhoods of complete -tuples, but the corresponding notions for partial -tuples will be convenient later. We also drop the subscript from the above notations if this does not cause any confusion.
A -neighborhood type is an isomorphism type for the -neighborhood of a complete -tuple in a degree- bounded structure. More precisely, we can define a -neighborhood type as a degree- bounded structure over the signature such that
-
the universe of is of the form for some and
-
for every there is such that , where, for every , is the interpretation of the sphere center constant .
From each isomorphism class of -neighborhood types we select a unique representative and write for the set of all selected representatives. Then, for every -tuple there is a unique such that ; we call it the -neighborhood type of and say that is a -tuple. In case we speak of -nodes instead of -tuples, write for and call its elements -neighborhood types instead of -neighborhood types.
For every -neighborhood type there is an FO-formula such that for every degree- bounded structure and every -tuple we have if and only if is a -tuple.
2.4 Enumeration algorithms and the machine model
FO-query enumeration is the following problem: Given an FO-formula over some signature and a relational structure over , we want to enumerate all tuples from in some order and without repetitions, i.e., we want to produce a sequence with , and is the end-of-enumeration marker. An algorithm for FO-query enumeration starts with a preprocessing phase in which no output is produced, followed by an enumeration phase, where the elements are produced one after the other. The running time of the preprocessing phase is called the preprocessing time, and the delay measures the maximal time between the computation of two consecutive outputs and for every .
Usually, one restricts the input structure to some subclass of relational structures that is defined by some parameter (in this paper, is the class of degree- bounded structures). We say that an algorithm for FO-query enumeration for has linear preprocessing and constant delay, if its preprocessing time is and its delay is for some computable function . This complexity measure where the query is considered to be constant and the running time is only measured in terms of the data, i.e., the size of the relational structure, is also called data complexity. In data complexity, linear preprocessing and constant delay is considered to be optimal (since we assume that the relational structure has to be read at least once). As mentioned in the introduction, FO-query enumeration can be solved with linear preprocessing and constant delay for several classes .
For proving upper bounds in data complexity, we often have to argue that certain computational tasks can be performed in time (or for some function . In these cases, without explicitly mentioning this in the remainder, will always be a computable function (actually, will be elementary, i.e., bounded by an exponent tower of fixed height). The arguments for will only depend on the parameter and the formula size .
The special feature of this work is that we consider FO-query enumeration in the setting where the relational structure is not given explicitly, but in a potentially highly compressed form, and our enumeration algorithm must handle this compressed representation rather than decompressing it. Then the structure size will be replaced by the size of the compressed representation of in all time bounds. This aspect shall be explained in detail in Section 4.
We use the standard RAM model with uniform cost measure as our model of computation. We will make some further restrictions for the register length tailored to the compressed setting in Section 4.2.
3 FO-Enumeration over Uncompressed Degree-Bounded Structures
In this section, we fix a relational signature , constants and , a degree- bounded structure over the signature , and an FO-formula over the signature with . Our goal is to enumerate the set after a linear time preprocessing in constant delay. Before we consider the case where the structure is given in a compressed form, we will first outline the enumeration algorithm from [28] for the case where is given explicitly (with some modifications). In Section 5 we will extend this algorithm to the compressed setting.
By a standard application of the Gaifman locality of FO (see [43]), we first reduce the enumeration of to the enumeration of all -tuples from for a fixed (for some ). Recall that contains at most elements, and this upper bound only depends on and the formula . To simplify notation, we assume that in the sphere center constant is interpreted by . In particular, the sphere center constants are interpreted by different elements. This is not a real restriction; see [43].
In order to enumerate all -tuples, we will factorize into its connected components. In order to accomplish this, we need the following definitions. We first define the larger radius
| (1) |
Every -neighborhood of an element has at most elements. Recall that a -neighborhood type is a degree- bounded structure over the signature with a universe for some . W.l.o.g. we assume that the sphere center constant is interpreted by the element in a -neighborhood type. Hence, every has distance at most from . Moreover, the -neighborhood types in are pairwise non-isomorphic.
Assume that our fixed -neighborhood type splits into connected components . Thus, every is a connected induced substructure of , every node of belongs to exactly one , and there is no edge in the undirected graph between two different components . Let be the set of sphere centers that belong to the connected component . We must have . Let (we could also fix any other element from ). Every node in has distance at most from some . Since is connected, it follows that every node in has distance at most from (this is in fact true for every instead of ). A consistent factorization of our -neighborhood type is a tuple
with the following properties for all :
-
and is a partial -tuple with and (so, is mapped by to the center of ) and
-
.
Clearly, the number of possible consistent factorizations of is bounded by .
For a -neighborhood type , a -node and a partial -tuple we moreover fix an isomorphism (this isomorphism is not necessarily unique) and define the partial -tuple by for all . Note that, by definition, .
Take a consistent factorization of . We say that an -tuple is admissible for if the following conditions hold:
-
for all , is a -node, and
-
for all with we have
(2)
Finally, with an -tuple we associate the -tuple
Note that .
We claim that in order to enumerate all -tuples , it suffices to enumerate for every consistent factorization of the set of all -tuples that are admissible for . If we can do this, then we replace every output tuple by . Note that can be easily computed in time from the tuple , the isomorphisms , and the partial -tuples . The correctness of this algorithm follows from the following two lemmas (with full proofs in [43]):
Lemma 2.
If is a consistent factorization of and is admissible for then is a -tuple.
Lemma 3.
If is a -tuple then there are a unique consistent factorization of and a unique -tuple that is admissible for and such that .
3.1 Enumeration algorithm for uncompressed structures
Let us fix a -neighborhood type and a consistent factorization of . By Lemmas 3 and 3, it suffices to enumerate (with linear preprocessing and constant delay) the set of all that are admissible for . In the preprocessing phase we compute
-
for every a list containing all -nodes from and
-
for every an isomorphism .
It is straightforward to compute these data in time (in Section 5, where we deal with the more general SLP-compressed case, this is more subtle). We classify each list as being short if and as being long otherwise. Without loss of generality, we assume that, for some the lists are short and the lists are long (note that this includes the cases that all lists are short or all lists are long).
Our enumeration procedure maintains a stack of the form with and for all . Note that if , then we have the empty stack . Such a stack is called admissible for (or just admissible), if for all with and all and we have . Note that the empty stack as well as every stack with are admissible.
The general structure of our enumeration algorithm is a depth-first-left-to-right (DFLR) traversal over all admissible stacks . For this, it calls the recursive procedure extend (shown as Algorithm 1) with the initial admissible stack . Note that whenever extend is called, holds. It is clear that the call extend triggers the enumeration of all admissible stacks . In an implementation one would store as a global variable.
Let us assume that we can check whether a stack is admissible in time (it is not hard to see that this is possible, and this aspect will anyway be discussed in detail for the compressed setting in Section 5). After the initial call extend, the algorithm constructs an admissible stack with (or terminates) after time bounded in and (since the lists are short). If some is non-admissible, i.e., the stack is not admissible, then and therefore for some . Thus, the total number of non-admissible elements from can be bounded by a function of and . Consequently, since is long, the algorithm necessarily finds some admissible (or terminates) after time bounded in and . From this observation, the following lemma can be obtained with moderate effort; see [43].
Lemma 4.
The delay of the above enumeration procedure is bounded by .
4 Straight-Line Programs for Relational Structures
In this section, we introduce the compression scheme that shall be used to compress relational structures. We first need the definition of pointed structures.
For , an -pointed structure is a pair , where is a structure and is injective. The elements in (, respectively) are called contact nodes (internal nodes, respectively). The node is called the -th contact node.
A relational straight-line program (r-SLP or just SLP) is a tuple , where
-
is a relational signature,
-
is a finite set of nonterminals, where every has a rank ,
-
is the initial nonterminal, where , and
-
is a set of productions that contains for every a unique production with a -pointed structure over the signature and a multiset of references of the form , where and is injective.
-
Define the binary relation on as follows: if and only if contains a reference of the form . Then we require that is acyclic. Its transitive closure is a partial order that we call the hierarchical order of .
Let be the size of . We define the ordered dag , where the child-function is defined as follows: Let and let be an enumeration of the references in , where every reference appears in the enumeration as many times as in the multiset . The specific order of the references is not important and assumed to be somehow given by the input encoding of We then define .
We now define for every nonterminal a -pointed relational structure (the value of ). We do this on an intuitive level, a formal definition can be found in the full version [43]. If , then we define . If, on the other hand, , then is obtained from by expanding all references in . A reference is expanded by the following steps: (i) create the disjoint union of and , (ii) merge node with node for every , (iii) remove from , and (iv) add all references from to . Due to the fact that is acyclic, we can keep on expanding references (the original ones from and the new ones added by the expansion operation) in any order until there are no references left. The resulting relational structure is ; see Example 4 and Figure 1 for an illustration.
We define . Since it can be viewed as an ordinary (-pointed) structure. It is not hard to see that and that this upper bound can be also reached. Thus, can be seen as a compressed representation of the structure .
In Section 2.2 we claimed that FO-query enumeration can be reduced to the case where only contains relation symbols of arity at most two (with the details given in the full version [43]). It is easy to see that this reduction can be also done in the SLP-compressed setting simply by applying the reduction to all structures ; details can be again found in [43].
We say that the SLP is apex, if for every and every reference we have . Thus, contact nodes of a right-hand side cannot be accessed by references. Apex SLPs are called -level restricted in [49]. It is easy to compute the maximal degree of nodes in for an apex SLP : for every node in a structure compute as the sum of (i) the degree of in and (ii) for every reference and every with , the degree of in . Then the maximum of all these numbers is the maximal degree of nodes in . The apex property implies a certain locality property for that will be explained in Section 4.1. In the rest of the paper we will mainly consider apex SLPs.
A simple example of a class of graphs that are exponentially compressible with apex SLPs is the class of perfect binary trees. The perfect binary tree of height (with leaves) can be produced by an apex SLP of size . Here is an explicit example for an apex SLP:
Example 5.
Consider the SLP where only contains a binary relation symbol and with , and . The productions of these nonterminals are depicted on the left of Figure 1. For instance, the production consists of the -pointed structure , where the universe of consists of the two red nodes and , and the reference set with , , and (in Figure 1 each is connected by a -labeled dotted line with the nonterminal). The production for nonterminal consists of a -pointed structure (and no references), the contact nodes of which are labeled by and . The structure is shown on the right of Figure 1. It can be obtained by first constructing by replacing the single -reference in by . Note that - and -labeled dotted lines identify the two nodes to be merged with the two contact nodes of , and that has exactly one contact node. Then we replace the -reference in by and both -references in by . This merges (and ) with the contact node of the first (and the second) occurrence of . Red (resp., blue, green) edges and nodes are produced from (resp., , ).
Since no contact node is adjacent to any reference, this SLP is apex. The size of is . The size of is : (for the -production) (for the -production) (for the -production).
4.1 Representation of nodes of an SLP-compressed structure
Let . A node can be uniquely represented by a pair such that is an -path in and one of the following two cases holds:
-
ends in and is an internal node.111The nodes in , i.e., the contact nodes of , are excluded here, because they were already generated by some larger (with respect to the hierarchical order ) nonterminal.
-
and .
We call this the -representation of . The -representations of the nodes of are also called -representations. Note that if is the -representation of a node then for some (since ). We will often identify a node of with its -representation; in particular when . One may view a -representation as a stack . In order to construct outgoing (or incoming) edges of in the structure , one only has to modify this stack at its end; see [43] for more details.
The apex condition implies a kind of locality in that can be nicely formulated in terms of -representations: If two nodes and have distance in the graph then the prefix distance between and (which is the number of edges in and that do not belong to the longest common prefix of and ) is also at most . This property is exploited several times in the paper.
Based on -representations, we can define a natural embedding of into in case . Assume that is a non-empty -to- path in with . Let us write for some nonterminal (we may have ). Let be the unique reference that corresponds to the edge in . We then define the embedding as follows, where is a node in given by its -representation so that is a -path (recall that the path is obtained by concatenating the paths and ; see Section 2.1):
We can extend this definition to the case (where ) by defining as the identity map on . If is the substructure of induced by the set then we write for the substructure of induced by the set . Note that in general we do not have . For instance, if then in there can be edges between contact nodes of that are generated by a nonterminal with .
Recall the definition of the lexicographic order on the set of all -paths of for (see Section 2.1). We define as the position of in the lexicographically sorted list of all -paths of , where we start with (i.e., ; note that is the empty path starting in and hence the lexicographically smallest path among all -paths). For we just write . Later it will be convenient to represent the initial path component of a -representation by the number and call be the lex-representation of the node . The number of initial paths in can be bounded by : the number of initial-to-leaf paths in is bounded by (this is implicitly shown in the proof of [11, Lemma 1]) and the number of all initial paths in is bounded by twice the number of initial-to-leaf paths in . Hence, the numbers have bit length .
Example 6.
Recall the SLP from Example 4 and shown to the right of ’s productions in Figure 1. Then the pairs and (recall that and are the two nodes of ) represent the two red nodes of , and , where is the green node in , represents the rightmost green node of . Its lex-representation is (there are six initial paths in ). As another example, the two leftmost (green) nodes of are represented by the pairs and with the lex-representations and , respectively. For the -to- path in we have and , where is the only reference in .
4.2 Register length in the compressed setting
In the following sections we will develop an enumeration algorithm for the set of all tuples in , where the SLP is part of the input. Recall that may contain many elements. In order to achieve constant delay, we therefore should set the register length in our algorithm to so that we can store elements of . This is in fact a standard assumption for algorithms on SLP-compressed objects. For instance, when dealing with SLP-compressed strings, one usually assumes that registers can store positions in the decompressed string. We only allow additions, subtractions and comparisons on these -bit registers and these operations take constant time (since we assume the uniform cost measure). For registers of length we will also allow pointer operations.
Note that a -representation needs many -bit registers, whereas its lex-representation fits into two registers (one of length ).
5 FO-Enumeration over SLP-Compressed Degree-Bounded Structures
We now have all definitions available in order to state a more precise version of Theorem 1:
Theorem 7.
Given an apex SLP such that is degree- bounded and an FO-formula , we can enumerate the result set with preprocessing time and delay for some computable function . All nodes of are output in their lex-representation.
The general structure of our enumeration algorithm is the same as for the uncompressed setting. In particular, we also use Gaifman-locality to reduce to the problem of enumerating for a fixed the set of all -tuples , which then reduces to the problem of enumerating for all consistent factorizations of the set of all -tuples that are admissible for (see the beginning of Section 3).
Here, a first complication occurs: one important component of the above reduction for the uncompressed setting is that FO model checking on degree- bounded structures can be done in time [62]. For the SLP-compressed setting we do not have a linear time (i. e., in time ) model checking algorithm. Only an NL-algorithm for apex SLPs is known [39]. It is not hard to obtain a linear time algorithm from the NL-algorithm in [39]. Alternatively, one can also bypasses model checking; see the full version [43].
Consequently, as in the uncompressed setting, it suffices to consider a fixed consistent factorization of and to enumerate the set of all -tuples in that are admissible for . As before we define the larger radius ; see (1).
5.1 Expansions of nonterminals
In this section we introduce the concept of -expansions for a constant (later, will be a constant of the form ), which will be needed to transfer the enumeration algorithm for the uncompressed setting (Section 3.1) to the SLP-compressed setting. The idea is to apply the productions from , starting with a nonterminal , until all nodes of that have distance at most from the nodes in the right-hand side of (except for the contact nodes of ) are produced. For a nonterminal we define
These are the internal nodes of (written in -representation) that are directly produced with the production . Let be a list of all nodes from . We then define the -expansion as the following induced substructure of :
We always assume that the nodes of are represented by their -representations. Let
be the boundary of . A valid substructure of is an induced substructure of with . If is a valid substructure of and is an -to- path in , then any neighbor of in the graph belongs to . Moreover, , since all contact nodes are excluded from a valid substructure of . In the following, we consider the radius . For a nonterminal we write for the expansion in the rest of the paper.
Fix a -neighborhood type . A node is called a valid -node in if (i) and (ii) is a valid substructure of . We say that is -useful if there is a valid -node in . We consider now the following two sets:
We define a mapping as follows. Let , where is an -to- path in and let be the -representation of . We then define (where the latter is a -representation that we identify as usual with a node from ). The following lemma is proved in the full version [43].
Lemma 8.
The mapping is a bijection from to .
5.2 Overview of the enumeration algorithm
Our goal is to carry out the algorithm described in Section 3.1, but in the compressed setting, i.e., by only using the apex SLP instead of the explicit structure . As in the uncompressed setting, it suffices to consider a fixed -neighborhood type together with a fixed consistent factorization
| (3) |
of and to enumerate the set of all -tuples in that are admissible for . In the following we sketch the algorithm; details can be found in the full version [43].
Enumeration of all -nodes.
The algorithm for the uncompressed setting (Section 3) precomputes for every a list of all -nodes of the structure . This is no longer possible in the compressed setting since the structure is too big. However, as shown in Section 5.1, there is a bijection between the set of -nodes in and the set of all pairs , where is an -to- path in for a -useful nonterminal and is a valid -node in that is written in its -representation . Hence, on a high level, instead of explicitly precomputing the lists of all -nodes, we enumerate them with Algorithm 2.
To execute this algorithm we first have to compute in the preprocessing all expansions for a nonterminal . This is easy: using a breath-first-search (BFS), we locally generate starting with the nodes in until all nodes with are generated. The size of is bounded by (the size of a -sphere around a tuple of length at most in a degree- bounded structure) and can be constructed in time . Summing over all shows that all -expansions can be precomputed in time .
With the available, we can easily precompute brute-force the set of all valid -nodes in (needed in Line 2 of Algorithm 2) and then the set of all -useful nonterminals (needed in Line 1 of Algorithm 2). Recall that is -useful iff there is a valid -node in . Moreover, for every valid -node we compute also an isomorphism . The time for this is bounded by for one nonterminal and hence by in total.
The most challenging part of Algorithm 2 is the enumeration of all initial paths in that end in a -useful nonterminal (Line 1). Let be the set of these paths. In constant delay, we cannot afford to output a path as a list of edges (it does not fit into a constant number of registers in our machine model, see Section 4.2). That is why we return the number (which fits into a single register in our machine model) in Line 3. The idea for constant-delay path enumeration is to run over all paths in lexicographical order and thereby maintain the number . The path is internally stored in a contracted form. If would be a binary dag, then we could use an enumeration algorithm from [42], where maximal subpaths of left (right, respectively) outgoing edges are contracted to single edges. In our setting, is not a binary dag, therefore we have to adapt the technique from [42] slightly; see [43].
Producing the final output tuples.
Note that for each enumerated -node we have to produce the partial -tuple (then the final output tuple is ). Let us first recall that in the uncompressed setting each partial -tuple is defined by for all , where is a precomputed isomorphism. In the compressed setting, Algorithm 2 outputs every -node as a triple , where the initial path ends in some -useful nonterminal and is a valid -node in . Moreover, we have a precomputed isomorphism , which yields the isomorphism . Then, for every we can easily compute the lex-representation of . We first compute in its -representation using the precomputed mapping . Then the lex-representation of is , where . Here, is produced by Algorithm 2. The path has length at most (this is a consequence of the apex condition for ). Its -number can be computed by summing at most many edge weights that were computed in the preprocessing phase.
Count total number of -neighborhoods.
In Section 3.1 we distinguish between short and long lists . Since in our compressed setting, Algorithm 2 replaces the precomputed list we have to count the number of triples produced by Algorithm 2 (of course, before we run the algorithm) in the preprocessing phase. This is easy: the number of output triples can be computed by summing over all -useful nonterminals the product of (i) the number of -to- paths in and (ii) the number of valid -nodes in . The latter can be computed in the preprocessing phase. Computing the number of -to- paths (for all ) involves a top-down pass (starting in ) over with many additions on -bit numbers in total.
Checking distance constraints.
Recall that we fixed the consistent factorization from (3) of the fixed -neighborhood type and want to enumerate all tuples that are admissible for . The definition of an admissible tuple also requires to check whether for all (see (2)). The nodes are enumerated with Algorithm 2, hence the following assumptions hold for all :
-
is given by a triple ,
-
is an initial-to- path in (for some -useful nonterminal ), and
-
is a node (written in -representation) from such that has -neighborhood type in and is a valid substructure of .
In a first step, we show that if then there is a path of length at most such that or . For this, the apex property for is important, since it lower bounds the distance between two nodes and of by the prefix distance between the paths and (i.e., the total number of edges that do not belong to the longest common prefix of and ).
We then proceed in two steps: We first check in time whether or for some path of length at most . For checking (the case is analogous) we check whether (by checking ) and if this is not the case, we repeatedly remove the last edge of (for at most times) and check whether the resulting path equals . However, the whole procedure is complicated by the fact that and are given in a contracted form, where some subpaths are contracted to single edges (see the above paragraph on the path enumeration algorithm for ).
In the second step we have to check in time whether , assuming that for some path of length at most . This boils down to checking, for every and , whether , which is the case iff , where correspond to in the sense that and . For this we locally construct by starting a BFS in and then computing all elements of with distance at most from just like we constructed all expansions (as explained above). This concludes our proof sketch for Theorem 7.
6 Conclusions and Outlook
We presented an enumeration algorithm for FO-queries on structures that are represented succinctly by apex SLPs. Assuming that the formula is fixed and the degree of the structure is bounded by a constant, the preprocessing time of our algorithm is linear and the delay is constant.
There are several possible directions into which our result can be extended. One option is to use more general formalisms for graph compression. Our SLPs are based on Courcelle’s HR (hyperedge replacement) algebra, which it tightly related to tree width [12, Section 2.3]. Our SLPs can be viewed as dag-compressed expressions in the HR algebra, where the leaves can be arbitrary pointed structures; see [39] for more details. Another (and in some sense more general) graph algebra is the VR algebra, which is tightly related to clique width [12, Section 2.5]. It is straightforward to define a notion of SLPs based on the VR algebra and this leads to the question whether our result also holds for the resulting VR-algebra-SLPs.
Another interesting question is to what extend the results on enumeration for conjunctive queries [4, 7] can be extended to the compressed setting. In this context, it is interesting to note that model checking for a fixed existential FO-formula on SLP-compressed structures (without the apex restriction) belongs to NL. It would be interesting to see, whether the constant delay enumeration algorithm from [4] for free-connex acyclic conjunctive queries can be extended to SLP-compressed structures.
Finally, one may ask whether in our main result (Theorem 7) the apex restriction is really needed. More precisely, consider an SLP such that has degree . Is it possible to construct from in time an equivalent apex SLP of size for a computable function ? If this is true then one could enforce the apex property in the preprocessing. In [17] it shown that a set of graphs of bounded degree that can be produced by a hyperedge replacement grammar (HRG) can be also produced by an apex HRG, but the size blow-up is not analyzed with respect to the parameter and the size of .
References
- [1] Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas W. Reps, and Mihalis Yannakakis. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27(4):86–818, 2005. doi:10.1145/1075382.1075387.
- [2] Rajeev Alur and Mihalis Yannakakis. Model checking of hierarchical state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(3):273–303, 2001. doi:10.1145/503502.503503.
- [3] Guillaume Bagan. MSO queries on tree decomposable structures are computable with linear delay. In Proceedings of the 20th International Workshop on Computer Science Logic, CSL 2006, volume 4207 of Lecture Notes in Computer Science, pages 167–181. Springer, 2006. doi:10.1007/11874683_11.
- [4] Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic, CSL 2007, volume 4646 of Lecture Notes in Computer Science, pages 208–222. Springer, 2007. doi:10.1007/978-3-540-74915-8_18.
- [5] Hideo Bannai, Momoko Hirayama, Danny Hucke, Shunsuke Inenaga, Artur Jeż, Markus Lohrey, and Carl Philipp Reh. The smallest grammar problem revisited. IEEE Transaction on Information Theory, 67(1):317–328, 2021. doi:10.1109/TIT.2020.3038147.
- [6] Michel Bauderon and Bruno Courcelle. Graph expressions and graph rewritings. Mathematical System Theory, 20(2-3):83–127, 1987. doi:10.1007/BF01692060.
- [7] Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. Constant delay enumeration for conjunctive queries: a tutorial. ACM SIGLOG News, 7(1):4–33, 2020. doi:10.1145/3385634.3385636.
- [8] Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering FO+MOD queries under updates on bounded degree databases. ACM Transactions on Database Systems, 43(2):7:1–7:32, 2018. doi:10.1145/3232056.
- [9] Romain Brenguier, Stefan Göller, and Ocan Sankur. A comparison of succinctly represented finite-state systems. In Proceedings of the 23rd International Conference on Concurrency Theory, CONCUR 2012, volume 7454 of Lecture Notes in Computer Science, pages 147–161. Springer, 2012. doi:10.1007/978-3-642-32940-1_12.
- [10] Katrin Casel, Henning Fernau, Serge Gaspers, Benjamin Gras, and Markus L. Schmid. On the complexity of the smallest grammar problem over fixed alphabets. Theory of Computing Systems, 65(2):344–409, 2021. doi:10.1007/s00224-020-10013-w.
- [11] Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Transactions on Information Theory, 51(7):2554–2576, 2005. doi:10.1109/TIT.2005.850116.
- [12] Bruno Courcelle and Joost Engelfriet. Graph Structure and Monadic Second-Order Logic - A Language-Theoretic Approach, volume 138 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2012. doi:10.1017/CBO9780511977619.
- [13] Arnaud Durand and Etienne Grandjean. First-order queries on structures of bounded degree are computable with constant delay. ACM Transactions on Computational Logic, 8(4):21, 2007. doi:10.1145/1276920.1276923.
- [14] Arnaud Durand, Nicole Schweikardt, and Luc Segoufin. Enumerating answers to first-order queries over databases of low degree. Logical Methods in Computer Science, 18(2), 2022. doi:10.46298/LMCS-18(2:7)2022.
- [15] Heinz-Dieter Ebbinghaus and Jörg Flum. Finite model theory. Perspectives in Mathematical Logic. Springer, 1995. doi:10.1007/3-540-28788-4.
- [16] Joost Engelfriet. Context-free graph grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 125–213. Springer, 1997. doi:10.1007/978-3-642-59126-6_3.
- [17] Joost Engelfriet, Linda Heyker, and George Leih. Context-free graph languages of bounded degree are generated by apex graph grammars. Acta Informatica, 31(4):341–378, 1994. doi:10.1007/BF01178511.
- [18] Joost Engelfriet and Grzegorz Rozenberg. A comparison of boundary graph grammars and context-free hypergraph grammars. Information and Computation, 84(2):163–206, 1990. doi:10.1016/0890-5401(90)90038-J.
- [19] Rachel Faran and Orna Kupferman. LTL with arithmetic and its applications in reasoning about hierarchical systems. In LPAR-22. 22nd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 57 of EPiC Series in Computing, pages 343–362. EasyChair, 2018. doi:10.29007/WPG3.
- [20] Rachel Faran and Orna Kupferman. A parametrized analysis of algorithms on hierarchical graphs. International Journal on Foundations of Computer Science, 30(6-7):979–1003, 2019. doi:10.1142/S0129054119400252.
- [21] Jörg Flum and Martin Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. doi:10.1007/3-540-29953-X.
- [22] Moses Ganardi and Pawel Gawrychowski. Pattern matching on grammar-compressed strings in linear time. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 2833–2846. SIAM, 2022. doi:10.1137/1.9781611977073.110.
- [23] Moses Ganardi, Artur Jeż, and Markus Lohrey. Balancing straight-line programs. Journal of the ACM, 68(4):27:1–27:40, 2021. doi:10.1145/3457389.
- [24] Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, and Kurt Sieber. Grammar-based compression of unranked trees. Theory of Computing Systems, 64(1):141–176, 2020. doi:10.1007/s00224-019-09942-y.
- [25] Ferenc Gécseg and Magnus Steinby. Tree languages. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 1–68. Springer, 1997. doi:10.1007/978-3-642-59126-6_1.
- [26] Stefan Göller and Markus Lohrey. Fixpoint logics on hierarchical structures. Theory of Computing Systems, 48(1):93–131, 2009. doi:10.1007/s00224-009-9227-1.
- [27] Annegret Habel and Hans-Jörg Kreowski. Some structural aspects of hypergraph languages generated by hyperedge replacement. In Proceedings of the 4th Annual Symposium on Theoretical Aspects of Computer Science, STACS 1987, volume 247 of Lecture Notes in Computer Science, pages 207–219. Springer, 1987. doi:10.1007/BFB0039608.
- [28] Wojciech Kazana and Luc Segoufin. First-order query evaluation on structures of bounded degree. Logical Methods in Computer Science, 7(2), 2011. doi:10.2168/LMCS-7(2:20)2011.
- [29] Wojciech Kazana and Luc Segoufin. Enumeration of first-order queries on classes of structures with bounded expansion. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, pages 297–308. ACM, 2013. doi:10.1145/2463664.2463667.
- [30] Wojciech Kazana and Luc Segoufin. Enumeration of monadic second-order queries on trees. ACM Transactions on Computational Logic, 14(4):25:1–25:12, 2013. doi:10.1145/2528928.
- [31] Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A formal language perspective on factorized representations. In Proceedings of the 28th International Conference on Database Theory, ICDT 2025, volume 328 of LIPIcs, pages 20:1–20:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPICS.ICDT.2025.20.
- [32] Stephan Kreutzer and Anuj Dawar. Parameterized complexity of first-order logic. Electronic Colloquium on Computational Complexity, TR09-131, 2009. URL: https://eccc.weizmann.ac.il/report/2009/131.
- [33] N. Jesper Larsson and Alistair Moffat. Off-line dictionary-based compression. Proceedings of the IEEE, 88(11):1722–1732, 2000. doi:10.1109/5.892708.
- [34] Thomas Lengauer. Hierarchical planarity testing algorithms. Journal of the ACM, 36(3):474–509, 1989. doi:10.1145/65950.65952.
- [35] Thomas Lengauer and Klaus W. Wagner. The correlation between the complexities of the nonhierarchical and hierarchical versions of graph problems. Journal of Computer and System Sciences, 44:63–93, 1992. doi:10.1016/0022-0000(92)90004-3.
- [36] Thomas Lengauer and Egon Wanke. Efficient solution of connectivity problems on hierarchically defined graphs. SIAM Journal on Computing, 17(6):1063–1080, 1988. doi:10.1137/0217068.
- [37] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
- [38] Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241–299, 2012. doi:10.1515/gcc-2012-0016.
- [39] Markus Lohrey. Model-checking hierarchical structures. Journal of Computer and System Sciences, 78(2):461–490, 2012. doi:10.1016/J.JCSS.2011.05.006.
- [40] Markus Lohrey. Grammar-based tree compression. In Proceedings of the 19th International Conference on Developments in Language Theory, DLT 2015, volume 9168 of Lecture Notes in Computer Science, pages 46–57. Springer, 2015. doi:10.1007/978-3-319-21500-6_3.
- [41] Markus Lohrey, Sebastian Maneth, and Roy Mennicke. XML tree structure compression using RePair. Information Systems, 38(8):1150–1167, 2013. doi:10.1016/J.IS.2013.06.006.
- [42] Markus Lohrey, Sebastian Maneth, and Carl Philipp Reh. Constant-time tree traversal and subtree equality check for grammar-compressed trees. Algorithmica, 80(7):2082–2105, 2018. doi:10.1007/s00453-017-0331-3.
- [43] Markus Lohrey, Sebastian Maneth, and Markus L. Schmid. FO-query enumeration over SLP-compressed structures of bounded degree, 2025. arXiv:2506.19421.
- [44] Markus Lohrey, Sebastian Maneth, and Manfred Schmidt-Schauß. Parameter reduction and automata evaluation for grammar-compressed trees. Journal of Computer and System Sciences, 78(5):1651–1669, 2012. doi:10.1016/j.jcss.2012.03.003.
- [45] Markus Lohrey and Markus L. Schmid. Enumeration for MSO-queries on compressed trees. Proceedings of the ACM on Management of Data, 2(2):78, 2024. doi:10.1145/3651141.
- [46] Sebastian Maneth and Fabian Peternek. Grammar-based graph compression. Information Systems, 76:19–45, 2018. doi:10.1016/J.IS.2018.03.002.
- [47] Sebastian Maneth and Fabian Peternek. Constant delay traversal of grammar-compressed graphs with bounded rank. Information and Computation, 273:104520, 2020. doi:10.1016/J.IC.2020.104520.
- [48] Madhav V. Marathe, Harry B. Hunt III, Richard Edwin Stearns, and Venkatesh Radhakrishnan. Approximation algorithms for PSPACE-hard hierarchically and periodically specified problems. SIAM Journal on Computing, 27(5):1237–1261, 1998. doi:10.1137/S0097539795285254.
- [49] Madhav V. Marathe, Harry B. Hunt III, and S. S. Ravi. The complexity of approximation PSPACE-complete problems for hierarchical specifications. Nordic Journal of Computing, 1(3):275–316, 1994. URL: https://www.cs.helsinki.fi/njc/njc1_papers/number3/paper1.pdf.
- [50] Madhav V. Marathe, Venkatesh Radhakrishnan, Harry B. Hunt III, and S. S. Ravi. Hierarchically specified unit disk graphs. Theoretical Computer Science, 174(1–2):23–65, 1997. doi:10.1016/S0304-3975(96)00008-4.
- [51] Martin Muñoz and Cristian Riveros. Constant-delay enumeration for SLP-compressed documents. Logical Methods in Computer Science, 21(1), 2025. doi:10.46298/LMCS-21(1:17)2025.
- [52] Craig G. Nevill-Manning and Ian H. Witten. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, 7:67–82, 1997. doi:10.1613/JAIR.374.
- [53] Dan Olteanu. Factorized databases: A knowledge compilation perspective. In Beyond NP, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 12, 2016, 2016. URL: http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12638.
- [54] Dan Olteanu and Maximilian Schleich. F: regression models over factorized views. Proceedings of the VLDB Endowment, 9(13):1573–1576, 2016. doi:10.14778/3007263.3007312.
- [55] Dan Olteanu and Maximilian Schleich. Factorized databases. ACM SIGMOD Record, 45(2):5–16, 2016. doi:10.1145/3003665.3003667.
- [56] Dan Olteanu and Jakub Závodný. Size bounds for factorised representations of query results. ACM Transactions on Database Systems, 40(1):2:1–2:44, 2015. doi:10.1145/2656335.
- [57] Leonid Peshkin. Structure induction by lossless graph compression. In Proceedings of the 2007 Data Compression Conference, DCC 2007, pages 53–62. IEEE Computer Society, 2007. doi:10.1109/DCC.2007.73.
- [58] William C. Rounds. Mappings and grammars on trees. Mathematical System Theory, 4(3):257–287, 1970. doi:10.1007/BF01695769.
- [59] Markus L. Schmid and Nicole Schweikardt. Spanner evaluation over SLP-compressed documents. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2021, pages 153–165. ACM, 2021. doi:10.1145/3452021.3458325.
- [60] Markus L. Schmid and Nicole Schweikardt. Query evaluation over SLP-represented document databases with complex document editing. In Proceedings of the International Conference on Management of Data, PODS 2022, pages 79–89. ACM, 2022. doi:10.1145/3517804.3524158.
- [61] Nicole Schweikardt, Luc Segoufin, and Alexandre Vigny. Enumeration for FO queries over nowhere dense graphs. Journal of the ACM, 69(3):22:1–22:37, 2022. doi:10.1145/3517035.
- [62] Detlef Seese. Linear time computable problems and first-order descriptions. Mathematical Structures in Computer Science, 6(6):505–526, 1996. doi:10.1017/S0960129500070079.
- [63] Luc Segoufin. Constant delay enumeration for conjunctive queries. ACM SIGMOD Record, 44(1):10–17, 2015. doi:10.1145/2783888.2783894.
- [64] Luc Segoufin and Alexandre Vigny. Constant delay enumeration for FO queries over databases with local bounded expansion. In Proceedings of the 20th International Conference on Database Theory, ICDT 2017, volume 68 of LIPIcs, pages 20:1–20:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.ICDT.2017.20.
