Abstract 1 Introduction 2 General Notations 3 FO-Enumeration over Uncompressed Degree-Bounded Structures 4 Straight-Line Programs for Relational Structures 5 FO-Enumeration over SLP-Compressed Degree-Bounded Structures 6 Conclusions and Outlook References

FO-Query Enumeration over SLP-Compressed Structures of Bounded Degree

Markus Lohrey ORCID University of Siegen, Germany Sebastian Maneth ORCID University of Bremen, Germany Markus L. Schmid ORCID Humboldt-Universität zu Berlin, Germany
Abstract

Enumerating the result set of a first-order query over a relational structure of bounded degree can be done with linear preprocessing and constant delay. In this work, we extend this result towards the compressed perspective where the structure is given in a potentially highly compressed form by a straight-line program (SLP). Our main result is an algorithm that enumerates the result set of a first-order query over a structure of bounded degree that is represented by an SLP satisfying the so-called apex condition. For a fixed formula, the enumeration algorithm has constant delay and needs a preprocessing time that is linear in the size of the SLP.

Keywords and phrases:
Enumeration algorithms, FO-logic, query evaluation over compressed data
Funding:
Markus L. Schmid: Supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) – project number 522576760 (gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 522576760).
Copyright and License:
[Uncaptioned image] © Markus Lohrey, Sebastian Maneth, and Markus L. Schmid; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Database query processing and optimization (theory)
; Theory of computation Logic and databases
Related Version:
Full Version: https://arxiv.org/abs/2506.19421 [43]
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Introduction

First order model checking (i.e., deciding whether an FO-sentence ϕ holds in a relational structure 𝒰, 𝒰ϕ for short) is a classical problem in computer science and its complexity has been thoroughly investigated; see, e.g., [21, 32, 37]. In database theory, it is of importance due to its practical relevance for evaluating SQL-like query languages in relational databases. FO model checking is PSPACE-complete when ϕ and 𝒰 are both part of the input, but it becomes fixed-parameter tractable (even linear fixed-parameter tractable) with respect to the parameter |ϕ| when 𝒰 is restricted to a suitable class of relational structures (see the paragraph on related work below for details), while for the class of all structures it is not fixed-parameter tractable modulo certain complexity assumptions. This is relevant, since in practical scenarios queries are often small, especially in comparison to the data (represented by the relational structure) that is often huge.

FO model checking (i.e., checking a Boolean query that returns either true or false) reduces to practical query evaluation tasks and is therefore suitable to transfer lower bounds. However, from a practical point of view, FO-query enumeration is more relevant. It is the problem of enumerating without repetitions for an FO-formula ϕ(x1,,xk) with free variables x1,,xk the result set ϕ(𝒰) of all tuples (a1,,ak)𝒰k such that 𝒰ϕ(a1,,ak). Since ϕ(𝒰) can be rather large (exponential in k in general), the total time for enumeration is not a good measure for the performance of an enumeration algorithm. More realistic measures are the preprocessing time (used for performing some preprocessing on the input) and the delay, which is the maximum time needed between the production of two consecutive output tuples from ϕ(𝒰). In data complexity (where we consider |ϕ| to be constant), the best we can hope for is linear preprocessing time (i.e., f(|ϕ|)|𝒰| for a computable function f) and constant delay (i.e., the delay is f(|ϕ|) for some computable function f and therefore does not depend on |𝒰|). Over the last two decades, many of the linear time (with respect to data complexity) FO model checking algorithms for various subclasses of structures have been extended to FO-query enumeration algorithms with linear (or quasi-linear) time preprocessing and constant delay (see the paragraph on related work below for the relevant literature).

In this work, we extend FO-query enumeration towards the compressed perspective, i.e., we wish to enumerate the result set ϕ(𝒰) in the scenario where 𝒰 is given in a potentially highly compressed form, and we want to work directly on this compressed form without decompressing it. In this regard, we contribute to a recent research effort in database theory that is concerned with query evaluation over compressed data [45, 51, 59, 60]. Let us now explain this framework in more detail.

Query evaluation over compressed data.

Query evaluation over compressed data combines the classical task of query evaluation with the paradigm of algorithmics on compressed data (ACD), i.e., solving computational tasks directly on compressed data objects without prior decompression. ACD is an established algorithmic paradigm and it works very well in the context of grammar-based compression with so-called straight-line programs (SLPs). Such SLPs use grammar-like formalisms in order to specify how a data object is constructed from small building blocks. For example, if the data object is a finite string w, then an SLP is just a context-free grammar for the language {w}. For instance, the SLP SAA, ABBC, Bba, Ccb (where S,A,B,C are nonterminals and a,b,c are terminals) produces the string babacbbabacb. While SLPs achieve exponential compression in the best case, there are also fast heuristic compressors that yield decent compression in practical scenarios. Moreover, SLPs are very well suited for ACD; see, e.g., [38].

An important point is that the ACD perspective can lead to dramatic running time improvements: if the same problem can be solved in linear time both in the uncompressed and in the compressed setting (i.e., linear in the compressed size), then in the case that the input can be compressed from size n to size 𝒪(logn) (which is possible with SLPs in the best case), the algorithm for the compressed data has a running time of 𝒪(logn) (compared to 𝒪(n) for the algorithm working on uncompressed data). An important problem that shows this behavior is for instance pattern matching in compressed texts [22].

SLPs are most famous for strings (see [5, 10, 22, 23] for some recent publications and [38] for a survey). What makes them particularly appealing for query evaluation is that their general approach of compressing data objects by grammars extends from strings to more complex structures like trees [24, 40, 42, 44] and hypergraphs (i.e., general relational structures) [36, 39, 46, 47], while, at the same time, their good ACD-properties are maintained to some extend. This is due to the fact that context-free string grammars extend to context-free tree grammars [58] (see also [25]) and to hyperedge replacement grammars [6, 27] (see also [16]).

In this work, we are concerned with FO-query enumeration for relational structures that are compressed by SLPs based on hyperedge replacement grammars (also known as hierarchical graph definitions or SL HR grammars; see the paragraph on related work for references). An example of such an SLP is shown in Figure 1. It consists of productions (shown in Figure 1 on the left) that replace nonterminals (S, A, and B in Figure 1) by their unique right-hand sides. Each right-hand side is a relational structure (a directed graph in Figure 1) together with occurrences of earlier defined nonterminals and certain distinguished contact nodes (labelled by 1 and 2 in Figure 1). In this way, every nonterminal X{S,A,B} produces a relational structure 𝗏𝖺𝗅(X) (the value of X) with distinguished contact nodes. These structures are shown in Figure 1 on the right. When replacing for instance the occurrence of B in the right hand side of S by 𝗏𝖺𝗅(B), one identifies for every i{1,2} the i-labelled contact node in 𝗏𝖺𝗅(B) with the node that is connected by the i-labelled dotted edge with the B-occurrence in the right-hand side of S (these are the nodes labelled with u and v in Figure 1).

Main result.

It is known that FO-query enumeration for degree-bounded structures can be done with linear preprocessing and constant delay [13, 28]. Moreover, FO model checking for SLP-compressed degree-bounded structures can be done efficiently [39]. We combine these two results and therefore extend FO-query enumeration for bounded-degree structures towards the SLP-compressed setting, or, in other words, we extend FO model checking of SLP-compressed structures to the query-enumeration perspective. A preliminary version of our main result is stated below. It restricts to so-called apex SLPs. Roughly speaking, the apex property demands that each graph replacing a nonterminal must not contain other nonterminals at the “contact nodes” (the nodes the nonterminal was incident with). The apex property is well known from graph language theory [16, 17, 18] and has been used for SLPs in [39, 49].

Theorem 1.

Let d be a constant. For an FO-formula ϕ(x1,,xk) and a relational structure 𝒰, whose Gaifman graph has degree at most d, and that is given in compressed form by an apex SLP D, one can enumerate the result set ϕ(𝒰) after preprocessing time f(d,|ϕ|)|D| and delay f(d,|ϕ|) for some computable function f.

Note that the preprocessing is linear in the compressed size |D| instead of the data size |𝒰|.

We prove this result by extending the FO-query enumeration algorithm for uncompressed structures from [28] to the SLP-compressed setting. For this we have to overcome considerable technical barriers. The algorithm of [28] exploits the Gaifman locality of FO-queries. In the preprocessing phase the algorithm computes for each element a𝒰 the r-sphere around a for a radius r that only depends on the formula ϕ. This leads to a preprocessing time of |𝒰|f(d,ϕ). For an SLP-compressed structure we cannot afford to iterate over all elements of the structure. Inspired by a technique from [39], we will expand every nonterminal of the SLP D locally up to a size that depends only on ϕ and d. This leads to at most |D| substructures of size f(d,|ϕ|). Our enumeration algorithm then enumerates certain paths in the derivation tree defined by D and for each such path ending in a nonterminal A it searches in the precomputed local expansion of A for nodes with a certain sphere type.

Related work.

In the uncompressed setting, there are several classes of relational structures for which FO-query enumeration can be solved with linear (or quasi-linear) preprocessing and constant delay, e.g., relational structures with bounded degree [8, 13, 28], low degree [14], (locally) bounded expansion [29, 64], and structures that are tree-like [3, 30] or nowhere dense [61]; see [7, 63] for surveys. Moreover, for conjunctive queries with certain acyclicity conditions, linear preprocessing and constant delay is also possible for the class of all relational structures [4, 7]. The algorithm from [28] is the most relevant one for our work.

Concerning other work on query enumeration on SLP-compressed data, we mention [51, 59, 60], which deals with constant delay enumeration for (a fragment of) MSO-queries on SLP-compressed strings, and [45], which presents a linear preprocessing and constant delay algorithm for MSO-queries on SLP-compressed unranked forests.

SLPs for (hyper)graphs were introduced as hierarchical graph descriptions by Lengauer and Wanke [36] and have been further studied, e.g., in [9, 19, 20, 34, 35, 49, 48, 50]. Model checking problems for SLP-compressed graphs have been studied in [39] for FO and MSO, [26] for fixpoint logics, and [1, 2] for the temporal logics LTL and CTL in the context of hierarchical state machines (which are a particular type of graph SLPs). Particularly relevant for this paper is a result from [39] stating that for every level Σi𝖯 of the polynomial time hierarchy there is a fixed FO-formula for which the model checking problem for SLP-compressed input graphs is Σi𝖯-complete. In contrast, for apex SLPs the model checking problem for every fixed FO-formula belongs to NL (nondeterministic logspace) [39]. This (and the fact that FO-query enumeration reduces to FO model checking) partly explains the restriction to apex SLPs in Theorem 1.

Compression of graphs via graph SLPs has been considered in [57] following a “Sequitur scheme” [52] and in [47] following a “RePair scheme” [33] (see also [41]); note that both compressors produce graph SLPs that may not be apex.

Another recent concept in database theory that is concerned with compressed representations of relational data and query evaluation are factorized databases (see [31, 53, 54, 55, 56]). Intuitively speaking, in a factorized representation of a relational structure each relation R is represented as an expression over the relational operators union and product that evaluates to R. However, SLPs for relational structures and factorized representations cover completely different aspects of redundancy: A factorized representation is always at least as large as its active domain (i.e., all elements that occur in some tuple), while an SLP for a relational structure can be of logarithmic size in the size of the universe of the structure. On the other hand, small factorized representations do not seem to necessarily translate into small SLPs.

2 General Notations

Let ={0,1,2,}. For every k, we set [k]={1,2,,k}. For a finite alphabet A, we denote by A the set of all finite strings over A including the empty string ε. For a partial f:AB let dom(f)={aA:f(a)}A (where B stands for undefined) and ran(f)={f(a):adom(f)}B. For functions f:AB and g:BC we define the composition gf:AC by (gf)(a)=g(f(a)) for all aA.

A partial k-tuple over a set A is a partial function t:[k]A. If dom(t)=[k], then we also say that t is a complete k-tuple or just a k-tuple; in this case we also write t in the conventional form (t(1),t(2),,t(k)). Two partial k-tuples t1 and t2 are disjoint if dom(t1)dom(t2)=. In this case, their union t1t2 is the partial k-tuple defined by (t1t2)(j)=ti(j) if jdom(ti) for i{1,2} and (t1t2)(j)= if jdom(t1)dom(t2).

2.1 Directed acyclic graphs

An ordered dag (directed acyclic graph) is a triple G=(V,γ,ι), where V is a finite set of nodes, γ:VV is the child-function, the relation E:={(u,v):u,vV,v occurs in γ(u)} is acyclic, and ιV is the initial node. The size of G is |G|=vV(1+|γ(v)|). A node vV with |γ(v)|=0 is called a leaf.

A path in G (from v0 to vn) is a sequence p=v0i1v1i2vn1invnV(V) such that 1ik|γ(vk1)| for all k[n]. The length of this path p is n (we may have n=0 in which case p=v0) and we also call p a v0-to-vn path or v0-path if the end point vn is not important. An ι-path is also called an initial path. We extend this notation to subsets of V in the obvious way, e.g., for A,BV and vV we talk about A-to-v paths, A-to-B paths, A-to-leaf paths (where “leaf” refers to the set of all leaves of the dag), initial-to-leaf paths, etc. For a v0-to-v1 path p=pv1 and a v1-to-v2 path q=v1q we define the v0-to-v2 path pq=pv1q (note that if we just concatenate p and q as words, then we have to replace the double occurrence v1v1 by v1 to obtain pq). We say that the path p is a prefix of the path q if there is a path r such that q=qr.

Since we consider ordered dags, there is a natural lexicographical ordering on all v-paths (i.e., all paths that start in the same node v). More precisely, for two different v-paths p and q we write p<q if either p is a proper prefix of q or we can write p and q as p=rip, q=rjq for paths r,p,q and i,j with i<j.

2.2 Relational structures and first order logic

A signature is a finite set consisting of relation symbols ri (iI) and constant symbols cj (jJ). Each relation symbol ri has an associated arity αi. A structure over the signature is a tuple 𝒰=(U,(Ri)iI,(uj)jJ), where U is a finite non-empty set (the universe of 𝒰), RiUαi is the relation associated with the relation symbol ri, and ujU is the constant associated with the constant symbol cj. Note that we restrict to finite structures. If the structure 𝒰 is clear from the context, we will identify Ri (uj, respectively) with the relation symbol ri (the constant symbol cj, respectively). Sometimes, when we want to refer to the universe U, we will refer to 𝒰 itself. For instance, we write a𝒰 for uaU, or f:[n]𝒰 for a function f:[n]U. The size |𝒰| of 𝒰 is |U|+iIαi|Ri|. As usual, a constant a𝒰 may be replaced by the unary relation {a}. Thus, in the following, we will only consider signatures without constant symbols, except when we explicitly introduce constants. Let ={ri:iI} be such a signature (we call it a relational signature) and let 𝒰=(U,(Ri)iI) be a structure over (we call it a relational structure). For relational structures 𝒰1 and 𝒰2 over the signature , we write 𝒰1𝒰2 to denote that 𝒰1 and 𝒰2 are isomorphic. A substructure of 𝒰=(U,(Ri)iI) is a structure (V,(Si)iI) such that VU and SiRi for all iI. The substructure of 𝒰 induced by VU is (V,(RiVαi)iI). We define the undirected graph 𝒢(𝒰)=(U,E) (the so-called Gaifman graph of 𝒰), where E contains an edge (a,b) if and only if there is a binary relation Ri (iI) and a tuple (a1,,aαi)Ri with {a,b}{a1,,aαi}. The degree of 𝒰 is the maximal degree of a node in 𝒢(𝒰). If 𝒰 has degree at most d, we also say that 𝒰 is a degree-d bounded structures.

We use first-order logic (FO) over finite relational structures; see [15] for a detailed introduction and the full version [43] for some standard notations. For an FO-formula ψ(x1,,xk) over the signature with free variables x1,,xk and a relational structure 𝒰=(U,(Ri)iI) over and a1,,akU, we write 𝒰ψ(a1,,ak) if ψ is true in 𝒰 when the variable xi is set to ai for all i[k]. Hence, an FO-formula ψ(x1,,xk) can be interpreted as an FO-query that, for a given structure 𝒰, defines a result set

ψ(𝒰)={(a1,,ak)𝒰k:𝒰ψ(a1,,ak)}.

The quantifier rank 𝗊𝗋(ψ) of an FO-formula ψ is inductively defined as follows: 𝗊𝗋(ψ)=0 if ψ contains no quantifiers, 𝗊𝗋(¬ψ)=𝗊𝗋(ψ), 𝗊𝗋(ψ1ψ2)=𝗊𝗋(ψ1ψ2)=max{𝗊𝗋(ψ1),𝗊𝗋(ψ2)} and 𝗊𝗋(xψ)=𝗊𝗋(xψ)=1+𝗊𝗋(ψ).

In the rest of the paper, we assume that the signature only contains relation symbols of arity at most two. It is folklore that FO model checking and FO-query enumeration over arbitrary signatures can be reduced to this case; see the full version [43] for a possible construction. This construction can be carried out in linear time (in the size of the formula and the structure) and it increase the degree of the structure as well as the quantifier rank of the formula by at most one.

2.3 Distances, spheres and neighborhoods

Let us fix a relational signature (containing only relation symbols of arity at most two) and let 𝒰=(U,(Ri)iI) be a structure over this signature. We say that 𝒰 is connected, if its Gaifman graph 𝒢(𝒰) is connected. The distance between elements a,bU in the graph 𝒢(𝒰) is denoted by 𝖽𝗂𝗌𝗍𝒰(a,b) (it can be ). For subsets A,BU we define 𝖽𝗂𝗌𝗍𝒰(A,B)=min{𝖽𝗂𝗌𝗍𝒰(a,b):aA,bB}. For two partial tuples (of any arity) t,t over U let 𝖽𝗂𝗌𝗍𝒰(t,t)=𝖽𝗂𝗌𝗍𝒰(ran(t),ran(t)).

Fix a constant d2. We will only consider degree-d bounded structures in the following. Let us fix such a structure 𝒰 (over the relational signature ). Take additional constant symbols c1,c2, called sphere center constants. For an r1 and a partial k-tuple t:[k]𝒰 we define the r-sphere 𝒮𝒰,r(t)={b𝒰:𝖽𝗂𝗌𝗍𝒰(t,b)r}. The r-neighborhood 𝒩𝒰,r(t) of t is obtained by taking the substructure of 𝒰 induced by 𝒮𝒰,r(t) and then adding every node t(i) (idom(t)) as the interpretation of the sphere center constant ci. Hence, it is a structure over the signature {ci:idom(t)}. The r-neighborhood of a k-tuple has at most ki=0rdikdr+1 elements (here, the inequality holds since we assume d2).

We use the above definitions also for a single element a𝒰 in place of a tuple t; formally a is identified with the 1-tuple t such that t(1)=a. We are mainly interested in r-spheres and r-neighborhoods of complete k-tuples, but the corresponding notions for partial k-tuples will be convenient later. We also drop the subscript 𝒰 from the above notations if this does not cause any confusion.

A (k,r)-neighborhood type is an isomorphism type for the r-neighborhood of a complete k-tuple in a degree-d bounded structure. More precisely, we can define a (k,r)-neighborhood type as a degree-d bounded structure over the signature {c1,,ck} such that

  • the universe of is of the form [] for some kdr+1 and

  • for every j[] there is i[k] such that 𝖽𝗂𝗌𝗍(ai,j)r, where, for every i[k], ai is the interpretation of the sphere center constant ci.

From each isomorphism class of (k,r)-neighborhood types we select a unique representative and write 𝒯k,r for the set of all selected representatives. Then, for every k-tuple a¯𝒰k there is a unique 𝒯k,r such that 𝒩𝒰,r(a¯); we call it the (k,r)-neighborhood type of a¯ and say that a¯ is a -tuple. In case k=1 we speak of -nodes instead of -tuples, write 𝒯r for 𝒯1,r and call its elements r-neighborhood types instead of (1,r)-neighborhood types.

For every (k,r)-neighborhood type 𝒯k,r there is an FO-formula ψ(x1,,xk) such that for every degree-d bounded structure 𝒰 and every k-tuple a¯𝒰k we have 𝒰ψ(a¯) if and only if a¯ is a -tuple.

2.4 Enumeration algorithms and the machine model

FO-query enumeration is the following problem: Given an FO-formula ϕ(x1,,xk) over some signature and a relational structure 𝒰 over , we want to enumerate all tuples from ϕ(𝒰) in some order and without repetitions, i.e., we want to produce a sequence (t1,,tn,tn+1) with {t1,,tn}=ϕ(𝒰), |ϕ(𝒰)|=n and tn+1=𝖤𝖮𝖤 is the end-of-enumeration marker. An algorithm for FO-query enumeration starts with a preprocessing phase in which no output is produced, followed by an enumeration phase, where the elements t1,t2,,tn,tn+1 are produced one after the other. The running time of the preprocessing phase is called the preprocessing time, and the delay measures the maximal time between the computation of two consecutive outputs ti and ti+1 for every i[n].

Usually, one restricts the input structure 𝒰 to some subclass 𝖢d of relational structures that is defined by some parameter d (in this paper, 𝖢d is the class of degree-d bounded structures). We say that an algorithm for FO-query enumeration for 𝖢d has linear preprocessing and constant delay, if its preprocessing time is 𝒪(|𝒰|f(d,|ϕ|)) and its delay is 𝒪(f(d,|ϕ|)) for some computable function f. This complexity measure where the query ϕ is considered to be constant and the running time is only measured in terms of the data, i.e., the size of the relational structure, is also called data complexity. In data complexity, linear preprocessing and constant delay is considered to be optimal (since we assume that the relational structure has to be read at least once). As mentioned in the introduction, FO-query enumeration can be solved with linear preprocessing and constant delay for several classes 𝖢d.

For proving upper bounds in data complexity, we often have to argue that certain computational tasks can be performed in time f() (or |𝒰|f()) for some function f. In these cases, without explicitly mentioning this in the remainder, f will always be a computable function (actually, f will be elementary, i.e., bounded by an exponent tower of fixed height). The arguments for f will only depend on the parameter d and the formula size |ϕ|.

The special feature of this work is that we consider FO-query enumeration in the setting where the relational structure 𝒰 is not given explicitly, but in a potentially highly compressed form, and our enumeration algorithm must handle this compressed representation rather than decompressing it. Then the structure size |𝒰| will be replaced by the size of the compressed representation of 𝒰 in all time bounds. This aspect shall be explained in detail in Section 4.

We use the standard RAM model with uniform cost measure as our model of computation. We will make some further restrictions for the register length tailored to the compressed setting in Section 4.2.

3 FO-Enumeration over Uncompressed Degree-Bounded Structures

In this section, we fix a relational signature ={Ri:iI}, constants d2 and ν, a degree-d bounded structure 𝒰=(U,(Ri)iI) over the signature , and an FO-formula ϕ(x1,,xk) over the signature with 𝗊𝗋(ϕ)=ν. Our goal is to enumerate the set ϕ(𝒰) after a linear time preprocessing in constant delay. Before we consider the case where the structure 𝒰 is given in a compressed form, we will first outline the enumeration algorithm from [28] for the case where 𝒰 is given explicitly (with some modifications). In Section 5 we will extend this algorithm to the compressed setting.

By a standard application of the Gaifman locality of FO (see [43]), we first reduce the enumeration of ϕ(𝒰) to the enumeration of all -tuples from 𝒰k for a fixed 𝒯k,r (for some r7ν). Recall that contains at most kdr+1 elements, and this upper bound only depends on d and the formula ϕ. To simplify notation, we assume that in the sphere center constant ci is interpreted by i[k]. In particular, the sphere center constants are interpreted by different elements. This is not a real restriction; see [43].

In order to enumerate all -tuples, we will factorize into its connected components. In order to accomplish this, we need the following definitions. We first define the larger radius

ρ=2rkr+k1. (1)

Every ρ-neighborhood of an element a𝒰 has at most dρ+1 elements. Recall that a ρ-neighborhood type is a degree-d bounded structure over the signature 1:={c1} with a universe [] for some dρ+1. W.l.o.g. we assume that the sphere center constant c1 is interpreted by the element 1 in a ρ-neighborhood type. Hence, every j[] has distance at most ρ from 1. Moreover, the ρ-neighborhood types in 𝒯ρ are pairwise non-isomorphic.

Assume that our fixed (k,r)-neighborhood type splits into m1 connected components 𝒞1,,𝒞m. Thus, every 𝒞i is a connected induced substructure of , every node of belongs to exactly one 𝒞i, and there is no edge in the undirected graph 𝒢() between two different components 𝒞i. Let Di=𝒞i[k] be the set of sphere centers that belong to the connected component 𝒞i. We must have Di. Let ni=min(Di) (we could also fix any other element from Di). Every node in 𝒞i has distance at most r from some jDi. Since 𝒞i is connected, it follows that every node in 𝒞i has distance at most r+(k1)(2r+1)=2rkr+k1=ρ from ni (this is in fact true for every jDi instead of ni). A consistent factorization of our (k,r)-neighborhood type is a tuple

Λ=(1,σ1,2,σ2,,m,σm)

with the following properties for all i[m]:

  • i𝒯ρ and σi:[k]i is a partial k-tuple with dom(σi)=Di and σi(ni)=1 (so, ni is mapped by σi to the center of i) and

  • 𝒩i,r(σi)𝒞i.

Clearly, the number of possible consistent factorizations of is bounded by f(d,|ϕ|).

For a ρ-neighborhood type , a -node a𝒰 and a partial k-tuple σ:[k] we moreover fix an isomorphism πa:𝒩𝒰,ρ(a) (this isomorphism is not necessarily unique) and define the partial k-tuple ta,σ:[k]𝒰 by ta,σ(j)=πa(σ(j)) for all jdom(σ). Note that, by definition, πa(1)=a.

Take a consistent factorization Λ=(1,σ1,,m,σm) of . We say that an m-tuple (b1,,bm)𝒰m is admissible for Λ if the following conditions hold:

  • for all i[m], bi is a i-node, and

  • for all i,j[m] with ij we have

    𝖽𝗂𝗌𝗍𝒰(tbi,σi,tbj,σj)>2r+1. (2)

Finally, with an m-tuple b¯=(b1,,bm) we associate the k-tuple

Λ(b¯)=tb1,σ1tb2,σ2tbm,σm.

Note that tbi,σi(ni)=πbi(σi(ni))=πbi(1)=bi.

We claim that in order to enumerate all -tuples a¯𝒰k, it suffices to enumerate for every consistent factorization Λ=(1,σ1,,m,σm) of the set of all m-tuples b¯𝒰m that are admissible for Λ. If we can do this, then we replace every output tuple b¯𝒰m by Λ(b¯)𝒰k. Note that Λ(b¯) can be easily computed in time 𝒪(k) from the tuple b¯, the isomorphisms πbi, and the partial k-tuples σi:[k]i. The correctness of this algorithm follows from the following two lemmas (with full proofs in [43]):

Lemma 2.

If Λ is a consistent factorization of and b¯𝒰m is admissible for Λ then Λ(b¯)𝒰k is a -tuple.

Lemma 3.

If a¯𝒰k is a -tuple then there are a unique consistent factorization Λ of and a unique m-tuple b¯𝒰m that is admissible for Λ and such that a¯=Λ(b¯).

3.1 Enumeration algorithm for uncompressed structures

Let us fix a (k,r)-neighborhood type and a consistent factorization Λ=(1,σ1,,m,σm) of . By Lemmas 3 and 3, it suffices to enumerate (with linear preprocessing and constant delay) the set of all a¯𝒰m that are admissible for Λ. In the preprocessing phase we compute

  • for every i[m] a list Li containing all i-nodes from 𝒰 and

  • for every aLi an isomorphism πa:i𝒩ρ(a).

It is straightforward to compute these data in time |𝒰|f(d,|ϕ|) (in Section 5, where we deal with the more general SLP-compressed case, this is more subtle). We classify each list Li as being short if |Li|kd2ρ+2r+2 and as being long otherwise. Without loss of generality, we assume that, for some 0qm the lists L1,,Lq are short and the lists Lq+1,,Lm are long (note that this includes the cases that all lists are short or all lists are long).

Our enumeration procedure maintains a stack of the form a1a2a with 0m and aiLi for all i[]. Note that if =0, then we have the empty stack ε. Such a stack is called admissible for Λ (or just admissible), if for all i,i[] with ii and all jdom(σi) and jdom(σi) we have 𝖽𝗂𝗌𝗍𝒰(πai(σi(j)),πai(σi(j)))>2r+1. Note that the empty stack as well as every stack a1 with a1L1 are admissible.

The general structure of our enumeration algorithm is a depth-first-left-to-right (DFLR) traversal over all admissible stacks s. For this, it calls the recursive procedure extend (shown as Algorithm 1) with the initial admissible stack s=ε. Note that whenever extend(s) is called, |s|<m holds. It is clear that the call extend(ε) triggers the enumeration of all admissible stacks a1a2am. In an implementation one would store s as a global variable.

Algorithm 1 𝖾𝗑𝗍𝖾𝗇𝖽(s).

Let us assume that we can check whether a stack s is admissible in time f(d,|ϕ|) (it is not hard to see that this is possible, and this aspect will anyway be discussed in detail for the compressed setting in Section 5). After the initial call extend(ε), the algorithm constructs an admissible stack s with |s|=q (or terminates) after time bounded in d,k,r and ρ (since the lists L1,,Lq are short). If some aLq+1 is non-admissible, i.e., the stack sa is not admissible, then 𝖽𝗂𝗌𝗍𝒰(tai,σi,ta,σq+1)2r+1 and therefore 𝖽𝗂𝗌𝗍(ai,a)2ρ+2r+1 for some i[q]. Thus, the total number of non-admissible elements from Lq+1 can be bounded by a function of d,k,r and ρ. Consequently, since Lq+1 is long, the algorithm necessarily finds some admissible aLq+1 (or terminates) after time bounded in d,k,r and ρ. From this observation, the following lemma can be obtained with moderate effort; see [43].

Lemma 4.

The delay of the above enumeration procedure is bounded by f(d,|ϕ|).

4 Straight-Line Programs for Relational Structures

In this section, we introduce the compression scheme that shall be used to compress relational structures. We first need the definition of pointed structures.

For n0, an n-pointed structure is a pair (𝒰,τ), where 𝒰 is a structure and τ:[n]𝒰 is injective. The elements in ran(τ) (𝒰ran(τ), respectively) are called contact nodes (internal nodes, respectively). The node τ(i) is called the i-th contact node.

A relational straight-line program (r-SLP or just SLP) is a tuple D=(,N,S,P), where

  • is a relational signature,

  • N is a finite set of nonterminals, where every AN has a rank 𝗋𝖺𝗇𝗄(A),

  • SN is the initial nonterminal, where 𝗋𝖺𝗇𝗄(S)=0, and

  • P is a set of productions that contains for every AN a unique production A(𝒰A,τA,EA) with (𝒰A,τA) a 𝗋𝖺𝗇𝗄(A)-pointed structure over the signature and EA a multiset of references of the form (B,σ), where BN and σ:[𝗋𝖺𝗇𝗄(B)]𝒰A is injective.

  • Define the binary relation D on N as follows: ADB if and only if EA contains a reference of the form (B,σ). Then we require that D is acyclic. Its transitive closure D is a partial order that we call the hierarchical order of D.

Let |D|=AN(|𝒰A|+(B,σ)EA(1+𝗋𝖺𝗇𝗄(B))) be the size of D. We define the ordered dag 𝖽𝖺𝗀(D)=(N,γ,S), where the child-function γ is defined as follows: Let BN and let (B1,σ1),,(Bm,σn) be an enumeration of the references in EB, where every reference appears in the enumeration as many times as in the multiset EB. The specific order of the references is not important and assumed to be somehow given by the input encoding of D We then define γ(B)=B1Bn.

We now define for every nonterminal AN a 𝗋𝖺𝗇𝗄(A)-pointed relational structure 𝗏𝖺𝗅(A) (the value of A). We do this on an intuitive level, a formal definition can be found in the full version [43]. If EA=, then we define 𝗏𝖺𝗅(A)=(𝒰A,τA). If, on the other hand, EA, then 𝗏𝖺𝗅(A) is obtained from (𝒰A,τA) by expanding all references in EA. A reference (B,σ)EA is expanded by the following steps: (i) create the disjoint union of 𝒰A and 𝒰B, (ii) merge node τB(i)𝒰B with node σ(i)𝒰A for every i[𝗋𝖺𝗇𝗄(B)], (iii) remove (B,σ) from EA, and (iv) add all references from EB to EA. Due to the fact that D is acyclic, we can keep on expanding references (the original ones from EA and the new ones added by the expansion operation) in any order until there are no references left. The resulting relational structure is 𝗏𝖺𝗅(A); see Example 4 and Figure 1 for an illustration.

We define 𝗏𝖺𝗅(D)=𝗏𝖺𝗅(S). Since 𝗋𝖺𝗇𝗄(S)=0 it can be viewed as an ordinary (0-pointed) structure. It is not hard to see that |𝗏𝖺𝗅(D)|2𝒪(|D|) and that this upper bound can be also reached. Thus, D can be seen as a compressed representation of the structure 𝗏𝖺𝗅(D).

In Section 2.2 we claimed that FO-query enumeration can be reduced to the case where only contains relation symbols of arity at most two (with the details given in the full version [43]). It is easy to see that this reduction can be also done in the SLP-compressed setting simply by applying the reduction to all structures 𝒰A; details can be again found in [43].

We say that the SLP D=(,N,S,P) is apex, if for every AN and every reference (B,σ)EA we have ran(σ)ran(τA)=. Thus, contact nodes of a right-hand side cannot be accessed by references. Apex SLPs are called 1-level restricted in [49]. It is easy to compute the maximal degree of nodes in 𝒢(𝗏𝖺𝗅(D)) for an apex SLP D: for every node v in a structure 𝒰A compute dv as the sum of (i) the degree of v in 𝒢(𝒰A) and (ii) for every reference (B,σ)EA and every i[𝗋𝖺𝗇𝗄(B)] with v=σ(i), the degree of τB(i) in 𝒢(𝒰B). Then the maximum of all these numbers dv is the maximal degree of nodes in 𝒢(𝗏𝖺𝗅(D)). The apex property implies a certain locality property for 𝗏𝖺𝗅(D) that will be explained in Section 4.1. In the rest of the paper we will mainly consider apex SLPs.

A simple example of a class of graphs that are exponentially compressible with apex SLPs is the class of perfect binary trees. The perfect binary tree of height n (with 2n leaves) can be produced by an apex SLP of size 𝒪(n). Here is an explicit example for an apex SLP:

Figure 1: The SLP D of Example 4 together with 𝖽𝖺𝗀(D) and 𝗏𝖺𝗅(X) for X{S,A,B}.
Example 5.

Consider the SLP D=(,N,S,P) where only contains a binary relation symbol r1 and N={S,A,B} with 𝗋𝖺𝗇𝗄(S)=0, 𝗋𝖺𝗇𝗄(A)=1 and 𝗋𝖺𝗇𝗄(B)=2. The productions of these nonterminals are depicted on the left of Figure 1. For instance, the production S(𝒰S,τS,ES) consists of the 0-pointed structure (𝒰S,τS), where the universe of 𝒰S consists of the two red nodes u and v, and the reference set ES={(A,σ1),(A,σ2),(B,σ3)} with σ1(1)=u, σ2(1)=v, σ3(1)=u and σ3(2)=v (in Figure 1 each σi(j) is connected by a j-labeled dotted line with the nonterminal). The production for nonterminal B consists of a 2-pointed structure (and no references), the contact nodes of which are labeled by 1 and 2. The structure 𝗏𝖺𝗅(D)=𝗏𝖺𝗅(S) is shown on the right of Figure 1. It can be obtained by first constructing 𝗏𝖺𝗅(A) by replacing the single B-reference in 𝒰A by 𝒰B=𝗏𝖺𝗅(B). Note that 1- and 2-labeled dotted lines identify the two nodes to be merged with the two contact nodes of 𝒰B, and that 𝗏𝖺𝗅(A) has exactly one contact node. Then we replace the B-reference in 𝒰S by 𝗏𝖺𝗅(B) and both A-references in 𝒰S by 𝗏𝖺𝗅(A). This merges u (and v) with the contact node of the first (and the second) occurrence of 𝗏𝖺𝗅(A). Red (resp., blue, green) edges and nodes are produced from S (resp., A, B).

Since no contact node is adjacent to any reference, this SLP is apex. The size of 𝗏𝖺𝗅(D) is 31. The size of D is 26: 9 (for the S-production) + 10 (for the A-production) + 7 (for the B-production).

4.1 Representation of nodes of an SLP-compressed structure

Let AN. A node a𝗏𝖺𝗅(A) can be uniquely represented by a pair (p,v) such that p is an A-path in 𝖽𝖺𝗀(D) and one of the following two cases holds:

  • p ends in BN{A} and v𝒰Bran(τB) is an internal node.111The nodes in ran(τB), i.e., the contact nodes of 𝒰B, are excluded here, because they were already generated by some larger (with respect to the hierarchical order D) nonterminal.

  • p=A and v𝒰A.

We call this the A-representation of a. The S-representations of the nodes of 𝗏𝖺𝗅(S)=𝗏𝖺𝗅(D) are also called D-representations. Note that if (p,v) is the D-representation of a node then v𝒰Aran(τA) for some AN (since 𝗋𝖺𝗇𝗄(S)=0). We will often identify a node of 𝗏𝖺𝗅(A) with its A-representation; in particular when A=S. One may view a D-representation (p,v) as a stack pv. In order to construct outgoing (or incoming) edges of (p,v) in the structure 𝗏𝖺𝗅(D), one only has to modify this stack at its end; see [43] for more details.

The apex condition implies a kind of locality in 𝗏𝖺𝗅(D) that can be nicely formulated in terms of D-representations: If two nodes a=(p,u) and b=(q,v) have distance ζ in the graph 𝒢(𝗏𝖺𝗅(D)) then the prefix distance between p and q (which is the number of edges in p and q that do not belong to the longest common prefix of p and q) is also at most ζ. This property is exploited several times in the paper.

Based on A-representations, we can define a natural embedding of 𝗏𝖺𝗅(B) into 𝗏𝖺𝗅(A) in case ADB. Assume that p is a non-empty A-to-B path in 𝖽𝖺𝗀(D) with AB. Let us write p=pCiB for some nonterminal C (we may have C=A). Let (B,σ)EC be the unique reference that corresponds to the edge (C,i,B) in 𝖽𝖺𝗀(D). We then define the embedding ηp:𝗏𝖺𝗅(B)𝗏𝖺𝗅(A) as follows, where (q,v) is a node in 𝗏𝖺𝗅(B) given by its B-representation so that q is a B-path (recall that the path pq is obtained by concatenating the paths p and q; see Section 2.1):

ηp(q,v)={(pC,σ(i)) if q=B and v=τB(j) for some j[𝗋𝖺𝗇𝗄(B)],(pq,v) otherwise.

We can extend this definition to the case A=B (where p=A) by defining ηp as the identity map on 𝗏𝖺𝗅(A)=𝗏𝖺𝗅(B). If 𝒰 is the substructure of 𝗏𝖺𝗅(B) induced by the set U𝗏𝖺𝗅(B) then we write ηp(𝒰) for the substructure of 𝗏𝖺𝗅(A) induced by the set ηp(U). Note that in general we do not have ηp(𝒰)𝒰. For instance, if 𝒰=𝗏𝖺𝗅(B) then in 𝗏𝖺𝗅(A) there can be edges between contact nodes of 𝗏𝖺𝗅(B) that are generated by a nonterminal C with CDB.

Recall the definition of the lexicographic order on the set of all A-paths of 𝖽𝖺𝗀(D) for AN (see Section 2.1). We define 𝗅𝖾𝗑A(p) as the position of p in the lexicographically sorted list of all A-paths of 𝖽𝖺𝗀(D), where we start with 0 (i.e., 𝗅𝖾𝗑A(A)=0; note that A is the empty path starting in A and hence the lexicographically smallest path among all A-paths). For 𝗅𝖾𝗑S(p) we just write 𝗅𝖾𝗑(p). Later it will be convenient to represent the initial path component p of a D-representation (p,v) by the number 𝗅𝖾𝗑(p) and call (𝗅𝖾𝗑(p),v) be the lex-representation of the node a=(p,v)𝗏𝖺𝗅(D). The number of initial paths in 𝖽𝖺𝗀(D) can be bounded by 2𝒪(|D|): the number of initial-to-leaf paths in 𝖽𝖺𝗀(D) is bounded by 3|𝖽𝖺𝗀(D)|/33|D|/3 (this is implicitly shown in the proof of [11, Lemma 1]) and the number of all initial paths in D is bounded by twice the number of initial-to-leaf paths in D. Hence, the numbers 𝗅𝖾𝗑(p) have bit length 𝒪(|D|).

Example 6.

Recall the SLP D from Example 4 and 𝖽𝖺𝗀(D) shown to the right of D’s productions in Figure 1. Then the pairs (S,u) and (S,v) (recall that u and v are the two nodes of 𝒰S) represent the two red nodes of 𝗏𝖺𝗅(D)=𝗏𝖺𝗅(S), and (S3B,w), where w is the green node in 𝒰B, represents the rightmost green node of 𝗏𝖺𝗅(D). Its lex-representation is (5,w) (there are six initial paths in 𝖽𝖺𝗀(D)). As another example, the two leftmost (green) nodes of 𝗏𝖺𝗅(D) are represented by the pairs (S1A1B,w) and (S2A1B,w) with the lex-representations (2,w) and (4,w), respectively. For the S-to-B path p=S2A1B in 𝖽𝖺𝗀(D) we have ηp(B,w)=(S2A1B,w) and ηp(B,τB(1))=(S2A,σ(1)), where (B,σ) is the only reference in EA.

4.2 Register length in the compressed setting

In the following sections we will develop an enumeration algorithm for the set of all tuples in ϕ(𝗏𝖺𝗅(D)), where the SLP D is part of the input. Recall that 𝗏𝖺𝗅(D) may contain 2Θ(|D|) many elements. In order to achieve constant delay, we therefore should set the register length in our algorithm to Θ(|D|) so that we can store elements of 𝗏𝖺𝗅(D). This is in fact a standard assumption for algorithms on SLP-compressed objects. For instance, when dealing with SLP-compressed strings, one usually assumes that registers can store positions in the decompressed string. We only allow additions, subtractions and comparisons on these Θ(|D|)-bit registers and these operations take constant time (since we assume the uniform cost measure). For registers of length 𝒪(log|D|) we will also allow pointer operations.

Note that a D-representation (p,v) needs 𝒪(|D|) many 𝒪(log|D|)-bit registers, whereas its lex-representation (𝗅𝖾𝗑(p),v) fits into two registers (one of length 𝒪(log|D|)).

5 FO-Enumeration over SLP-Compressed Degree-Bounded Structures

We now have all definitions available in order to state a more precise version of Theorem 1:

Theorem 7.

Given an apex SLP D such that 𝗏𝖺𝗅(D) is degree-d bounded and an FO-formula ϕ(x1,,xk), we can enumerate the result set ϕ(𝒰) with preprocessing time f(d,|ϕ|)|D| and delay f(d,|ϕ|) for some computable function f. All nodes of ϕ(𝒰) are output in their lex-representation.

Throughout Section 5 we fix D=(,N,S,P) and ϕ(x1,,xn) as in Theorem 7. Let 𝗊𝗋(ϕ)=ν. W.l.o.g. we can assume that d2.

The general structure of our enumeration algorithm is the same as for the uncompressed setting. In particular, we also use Gaifman-locality to reduce to the problem of enumerating for a fixed 𝒯k,r the set of all -tuples a¯𝗏𝖺𝗅(D)k, which then reduces to the problem of enumerating for all consistent factorizations Λ=(1,σ1,,m,σm) of the set of all m-tuples b¯𝗏𝖺𝗅(D)m that are admissible for Λ (see the beginning of Section 3).

Here, a first complication occurs: one important component of the above reduction for the uncompressed setting is that FO model checking on degree-d bounded structures can be done in time |𝒰|f(d,|ϕ|) [62]. For the SLP-compressed setting we do not have a linear time (i. e., in time |D|f(d,|ϕ|)) model checking algorithm. Only an NL-algorithm for apex SLPs is known [39]. It is not hard to obtain a linear time algorithm from the NL-algorithm in [39]. Alternatively, one can also bypasses model checking; see the full version [43].

Consequently, as in the uncompressed setting, it suffices to consider a fixed consistent factorization Λ=(1,σ1,,m,σm) of and to enumerate the set of all m-tuples in 𝗏𝖺𝗅(D) that are admissible for Λ. As before we define the larger radius ρ=2rkr+k1; see (1).

5.1 Expansions of nonterminals

In this section we introduce the concept of ζ-expansions for a constant ζ1 (later, ζ will be a constant of the form f(d,|ϕ|)), which will be needed to transfer the enumeration algorithm for the uncompressed setting (Section 3.1) to the SLP-compressed setting. The idea is to apply the productions from D, starting with a nonterminal AN, until all nodes of 𝗏𝖺𝗅(A) that have distance at most ζ from the nodes in the right-hand side of A (except for the contact nodes of A) are produced. For a nonterminal AN we define

𝖨𝗇A={(A,v):v𝒰Aran(τA)}𝗏𝖺𝗅(A).

These are the internal nodes of 𝗏𝖺𝗅(A) (written in A-representation) that are directly produced with the production A(𝒰A,τA,EA). Let a1,,am be a list of all nodes from 𝖨𝗇A. We then define the ζ-expansion as the following induced substructure of 𝗏𝖺𝗅(A):

ζ(A)=𝒩𝗏𝖺𝗅(A),ζ(a1,,am).

We always assume that the nodes of ζ(A) are represented by their A-representations. Let

𝖡𝖽A,ζ={(A,v):vran(τA)}{a𝗏𝖺𝗅(A):𝖽𝗂𝗌𝗍𝗏𝖺𝗅(A)(𝖨𝗇A,a)=ζ}𝗏𝖺𝗅(A)

be the boundary of ζ(A). A valid substructure of ζ(A) is an induced substructure 𝒜 of ζ(A) with 𝒜𝖡𝖽A,ζ=𝒜𝖨𝗇A. If 𝒜 is a valid substructure of ζ(A) and p is an S-to-A path in 𝖽𝖺𝗀(D), then any neighbor of ηp(𝒜) in the graph 𝒢(𝗏𝖺𝗅(D)) belongs to ηp(ζ(A)). Moreover, ηp(𝒜)𝒜, since all contact nodes (A,τA(i)) are excluded from a valid substructure of ζ(A). In the following, we consider the radius ζ=2ρ+1. For a nonterminal AN we write (A) for the expansion 2ρ+1(A) in the rest of the paper.

Fix a ρ-neighborhood type . A node a(A)𝗏𝖺𝗅(A) is called a valid -node in (A) if (i) 𝒩(A),ρ(a) and (ii) 𝒩(A),ρ(a) is a valid substructure of (A). We say that A is -useful if there is a valid -node in (A). We consider now the following two sets:

  • 𝖲1={(p,a):AN: p is an S-to-A path in 𝖽𝖺𝗀(D)a is a valid -node in (A)}

  • 𝖲2={b𝗏𝖺𝗅(D):b is a -node}

We define a mapping h:𝖲1𝗏𝖺𝗅(D) as follows. Let (p,a)𝖲1, where p is an S-to-A path in 𝖽𝖺𝗀(D) and let (q,v) be the A-representation of a(A). We then define h(p,a)=ηp(a)=(pq,v) (where the latter is a D-representation that we identify as usual with a node from 𝗏𝖺𝗅(D)). The following lemma is proved in the full version [43].

Lemma 8.

The mapping h is a bijection from 𝖲1 to 𝖲2.

5.2 Overview of the enumeration algorithm

Our goal is to carry out the algorithm described in Section 3.1, but in the compressed setting, i.e., by only using the apex SLP D=(,N,S,P) instead of the explicit structure 𝗏𝖺𝗅(D). As in the uncompressed setting, it suffices to consider a fixed (k,r)-neighborhood type 𝒯k,r together with a fixed consistent factorization

Λ=(1,σ1,,m,σm) (3)

of and to enumerate the set of all m-tuples in 𝗏𝖺𝗅(D) that are admissible for Λ. In the following we sketch the algorithm; details can be found in the full version [43].

Enumeration of all 𝓑𝒊-nodes.

The algorithm for the uncompressed setting (Section 3) precomputes for every i a list Li of all i-nodes of the structure 𝒰. This is no longer possible in the compressed setting since the structure 𝗏𝖺𝗅(D) is too big. However, as shown in Section 5.1, there is a bijection between the set of i-nodes in 𝗏𝖺𝗅(D) and the set of all pairs (p,a), where p is an S-to-A path in 𝖽𝖺𝗀(D) for a i-useful nonterminal A and a is a valid i-node in (A) that is written in its A-representation (q,v). Hence, on a high level, instead of explicitly precomputing the lists Li of all i-nodes, we enumerate them with Algorithm 2.

Algorithm 2 enumeration of all i-nodes.

To execute this algorithm we first have to compute in the preprocessing all expansions (A) for a nonterminal A. This is easy: using a breath-first-search (BFS), we locally generate 𝗏𝖺𝗅(A) starting with the nodes in 𝖨𝗇A until all nodes a𝗏𝖺𝗅(A) with 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(A)(𝖨𝗇A,a)2ρ+1 are generated. The size of (A) is bounded by |𝒰A|f(d,|ϕ|) (the size of a (2ρ+1)-sphere around a tuple of length at most |𝒰A| in a degree-d bounded structure) and can be constructed in time |𝒰A|f(d,|ϕ|). Summing over all AN shows that all (2ρ+1)-expansions can be precomputed in time |D|f(d,|ϕ|).

With the (A) available, we can easily precompute brute-force the set of all valid i-nodes in (A) (needed in Line 2 of Algorithm 2) and then the set of all i-useful nonterminals (needed in Line 1 of Algorithm 2). Recall that A is i-useful iff there is a valid i-node in (A). Moreover, for every valid i-node c=(q,v)(A) we compute also an isomorphism πc:i𝒩(A),ρ(ci). The time for this is bounded by f(d,|φ|) for one nonterminal A and hence by |D|f(d,|ϕ|) in total.

The most challenging part of Algorithm 2 is the enumeration of all initial paths p in 𝖽𝖺𝗀(D) that end in a i-useful nonterminal (Line 1). Let 𝒫i be the set of these paths. In constant delay, we cannot afford to output a path p𝒫i as a list of edges (it does not fit into a constant number of registers in our machine model, see Section 4.2). That is why we return the number 𝗅𝖾𝗑(p) (which fits into a single register in our machine model) in Line 3. The idea for constant-delay path enumeration is to run over all paths p𝒫i in lexicographical order and thereby maintain the number 𝗅𝖾𝗑(p). The path p is internally stored in a contracted form. If 𝖽𝖺𝗀(D) would be a binary dag, then we could use an enumeration algorithm from [42], where maximal subpaths of left (right, respectively) outgoing edges are contracted to single edges. In our setting, 𝖽𝖺𝗀(D) is not a binary dag, therefore we have to adapt the technique from [42] slightly; see [43].

In order to see how Algorithm 2 can be used to replace the precomputed lists Li in Algorithm 1 for the uncompressed setting, a few additional points have to be clarified.

Producing the final output tuples.

Note that for each enumerated i-node bi𝗏𝖺𝗅(D) we have to produce the partial k-tuple tbi,σi (then the final output tuple is tb1,σ1tb2,σ2tbm,σm). Let us first recall that in the uncompressed setting each partial k-tuple tbi,σi is defined by tbi,σi(j)=πbi(σi(j)) for all jdom(σi), where πbi:i𝒩𝒰,ρ(bi) is a precomputed isomorphism. In the compressed setting, Algorithm 2 outputs every i-node bi𝗏𝖺𝗅(D) as a triple (𝗅𝖾𝗑(pi),qi,v), where the initial path pi𝒫i ends in some i-useful nonterminal AiN and ci:=(qi,vi) is a valid i-node in (Ai). Moreover, we have a precomputed isomorphism πci:i𝒩(Ai),ρ(ci), which yields the isomorphism πbi=ηpiπci:i𝒩𝗏𝖺𝗅(D),ρ(bi). Then, for every jdom(σi) we can easily compute the lex-representation of πbi(σi(j)). We first compute πci(σi(j)) in its Ai-representation (qi,j,vi,j) using the precomputed mapping πci. Then the lex-representation of tbi,σi(j)=πbi(σi(j)) is (𝗅𝖾𝗑(piqi,j),vi,j), where 𝗅𝖾𝗑(piqi,j)=𝗅𝖾𝗑(pi)+𝗅𝖾𝗑Ai(qi,j). Here, 𝗅𝖾𝗑(pi) is produced by Algorithm 2. The path qi,j has length at most 2ρ+1 (this is a consequence of the apex condition for D). Its 𝗅𝖾𝗑-number 𝗅𝖾𝗑Ai(qi,j) can be computed by summing at most 2ρ+1 many edge weights that were computed in the preprocessing phase.

Count total number of 𝝆-neighborhoods.

In Section 3.1 we distinguish between short and long lists Li. Since in our compressed setting, Algorithm 2 replaces the precomputed list Li we have to count the number of triples produced by Algorithm 2 (of course, before we run the algorithm) in the preprocessing phase. This is easy: the number of output triples can be computed by summing over all i-useful nonterminals A the product of (i) the number of S-to-A paths in 𝖽𝖺𝗀(D) and (ii) the number of valid i-nodes in (A). The latter can be computed in the preprocessing phase. Computing the number of S-to-A paths (for all AN) involves a top-down pass (starting in S) over 𝖽𝖺𝗀(D) with |𝖽𝖺𝗀(D)||D| many additions on 𝒪(|D|)-bit numbers in total.

Checking distance constraints.

Recall that we fixed the consistent factorization Λ from (3) of the fixed (k,r)-neighborhood type and want to enumerate all tuples (b1,,bm)𝗏𝖺𝗅(D)m that are admissible for Λ. The definition of an admissible tuple also requires to check whether 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(D)(tbi,σi,tbj,σj)>2r+1 for all ij (see (2)). The nodes bi are enumerated with Algorithm 2, hence the following assumptions hold for all i[m]:

  • bi is given by a triple (𝗅𝖾𝗑(pi),qi,vi),

  • pi is an initial-to-Ai path in 𝖽𝖺𝗀(D) (for some i-useful nonterminal Ai), and

  • ci:=(qi,vi) is a node (written in Ai-representation) from (Ai) such that ci has ρ-neighborhood type i in (Ai) and 𝒩(Ai),ρ(ci) is a valid substructure of (Ai).

In a first step, we show that if 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(D)(tbi,σi,tbj,σj)2r+1 then there is a path q of length at most 3ρr such that pi=pjq or pj=piq. For this, the apex property for D is important, since it lower bounds the distance between two nodes a=(p,u) and a=(p,v) of 𝗏𝖺𝗅(D) by the prefix distance between the paths p and p (i.e., the total number of edges that do not belong to the longest common prefix of p and p).

We then proceed in two steps: We first check in time f(d,|ϕ|) whether pj=piq or pi=pjq for some path q of length at most 3ρr. For checking pj=piq (the case pi=pjq is analogous) we check whether pj=pi (by checking 𝗅𝖾𝗑(pj)=𝗅𝖾𝗑(pi)) and if this is not the case, we repeatedly remove the last edge of pj (for at most 3ρr times) and check whether the resulting path equals pi. However, the whole procedure is complicated by the fact that pi and pj are given in a contracted form, where some subpaths are contracted to single edges (see the above paragraph on the path enumeration algorithm for 𝖽𝖺𝗀(D)).

In the second step we have to check in time f(d,|ϕ|) whether 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(D)(tbi,σi,tbj,σj)2r+1, assuming that pj=piq for some path q of length at most 3ρr. This boils down to checking, for every bran(tbi,σi) and bran(tbj,σj), whether 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(D)(b,b)2r+1, which is the case iff 𝖽𝗂𝗌𝗍𝗏𝖺𝗅(Ai)(c,ηq(c))2r+1, where c,c𝗏𝖺𝗅(Ai) correspond to b,b in the sense that ηpi(c)=b and ηpj(c)=b. For this we locally construct 𝒩𝗏𝖺𝗅(Ai),2r+1(c) by starting a BFS in c and then computing all elements of 𝗏𝖺𝗅(Ai) with distance at most 2r+1 from c just like we constructed all expansions (A) (as explained above). This concludes our proof sketch for Theorem 7.

6 Conclusions and Outlook

We presented an enumeration algorithm for FO-queries on structures that are represented succinctly by apex SLPs. Assuming that the formula is fixed and the degree of the structure is bounded by a constant, the preprocessing time of our algorithm is linear and the delay is constant.

There are several possible directions into which our result can be extended. One option is to use more general formalisms for graph compression. Our SLPs are based on Courcelle’s HR (hyperedge replacement) algebra, which it tightly related to tree width [12, Section 2.3]. Our SLPs can be viewed as dag-compressed expressions in the HR algebra, where the leaves can be arbitrary pointed structures; see [39] for more details. Another (and in some sense more general) graph algebra is the VR algebra, which is tightly related to clique width [12, Section 2.5]. It is straightforward to define a notion of SLPs based on the VR algebra and this leads to the question whether our result also holds for the resulting VR-algebra-SLPs.

Another interesting question is to what extend the results on enumeration for conjunctive queries [4, 7] can be extended to the compressed setting. In this context, it is interesting to note that model checking for a fixed existential FO-formula on SLP-compressed structures (without the apex restriction) belongs to NL. It would be interesting to see, whether the constant delay enumeration algorithm from [4] for free-connex acyclic conjunctive queries can be extended to SLP-compressed structures.

Finally, one may ask whether in our main result (Theorem 7) the apex restriction is really needed. More precisely, consider an SLP D such that 𝗏𝖺𝗅(D) has degree d. Is it possible to construct from D in time |D|f(d) an equivalent apex SLP D of size |D|f(d) for a computable function f? If this is true then one could enforce the apex property in the preprocessing. In [17] it shown that a set of graphs of bounded degree d that can be produced by a hyperedge replacement grammar (HRG) H can be also produced by an apex HRG, but the size blow-up is not analyzed with respect to the parameter d and the size of H.

References

  • [1] Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas W. Reps, and Mihalis Yannakakis. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27(4):86–818, 2005. doi:10.1145/1075382.1075387.
  • [2] Rajeev Alur and Mihalis Yannakakis. Model checking of hierarchical state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(3):273–303, 2001. doi:10.1145/503502.503503.
  • [3] Guillaume Bagan. MSO queries on tree decomposable structures are computable with linear delay. In Proceedings of the 20th International Workshop on Computer Science Logic, CSL 2006, volume 4207 of Lecture Notes in Computer Science, pages 167–181. Springer, 2006. doi:10.1007/11874683_11.
  • [4] Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic, CSL 2007, volume 4646 of Lecture Notes in Computer Science, pages 208–222. Springer, 2007. doi:10.1007/978-3-540-74915-8_18.
  • [5] Hideo Bannai, Momoko Hirayama, Danny Hucke, Shunsuke Inenaga, Artur Jeż, Markus Lohrey, and Carl Philipp Reh. The smallest grammar problem revisited. IEEE Transaction on Information Theory, 67(1):317–328, 2021. doi:10.1109/TIT.2020.3038147.
  • [6] Michel Bauderon and Bruno Courcelle. Graph expressions and graph rewritings. Mathematical System Theory, 20(2-3):83–127, 1987. doi:10.1007/BF01692060.
  • [7] Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. Constant delay enumeration for conjunctive queries: a tutorial. ACM SIGLOG News, 7(1):4–33, 2020. doi:10.1145/3385634.3385636.
  • [8] Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering FO+MOD queries under updates on bounded degree databases. ACM Transactions on Database Systems, 43(2):7:1–7:32, 2018. doi:10.1145/3232056.
  • [9] Romain Brenguier, Stefan Göller, and Ocan Sankur. A comparison of succinctly represented finite-state systems. In Proceedings of the 23rd International Conference on Concurrency Theory, CONCUR 2012, volume 7454 of Lecture Notes in Computer Science, pages 147–161. Springer, 2012. doi:10.1007/978-3-642-32940-1_12.
  • [10] Katrin Casel, Henning Fernau, Serge Gaspers, Benjamin Gras, and Markus L. Schmid. On the complexity of the smallest grammar problem over fixed alphabets. Theory of Computing Systems, 65(2):344–409, 2021. doi:10.1007/s00224-020-10013-w.
  • [11] Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem. IEEE Transactions on Information Theory, 51(7):2554–2576, 2005. doi:10.1109/TIT.2005.850116.
  • [12] Bruno Courcelle and Joost Engelfriet. Graph Structure and Monadic Second-Order Logic - A Language-Theoretic Approach, volume 138 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2012. doi:10.1017/CBO9780511977619.
  • [13] Arnaud Durand and Etienne Grandjean. First-order queries on structures of bounded degree are computable with constant delay. ACM Transactions on Computational Logic, 8(4):21, 2007. doi:10.1145/1276920.1276923.
  • [14] Arnaud Durand, Nicole Schweikardt, and Luc Segoufin. Enumerating answers to first-order queries over databases of low degree. Logical Methods in Computer Science, 18(2), 2022. doi:10.46298/LMCS-18(2:7)2022.
  • [15] Heinz-Dieter Ebbinghaus and Jörg Flum. Finite model theory. Perspectives in Mathematical Logic. Springer, 1995. doi:10.1007/3-540-28788-4.
  • [16] Joost Engelfriet. Context-free graph grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 125–213. Springer, 1997. doi:10.1007/978-3-642-59126-6_3.
  • [17] Joost Engelfriet, Linda Heyker, and George Leih. Context-free graph languages of bounded degree are generated by apex graph grammars. Acta Informatica, 31(4):341–378, 1994. doi:10.1007/BF01178511.
  • [18] Joost Engelfriet and Grzegorz Rozenberg. A comparison of boundary graph grammars and context-free hypergraph grammars. Information and Computation, 84(2):163–206, 1990. doi:10.1016/0890-5401(90)90038-J.
  • [19] Rachel Faran and Orna Kupferman. LTL with arithmetic and its applications in reasoning about hierarchical systems. In LPAR-22. 22nd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 57 of EPiC Series in Computing, pages 343–362. EasyChair, 2018. doi:10.29007/WPG3.
  • [20] Rachel Faran and Orna Kupferman. A parametrized analysis of algorithms on hierarchical graphs. International Journal on Foundations of Computer Science, 30(6-7):979–1003, 2019. doi:10.1142/S0129054119400252.
  • [21] Jörg Flum and Martin Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. doi:10.1007/3-540-29953-X.
  • [22] Moses Ganardi and Pawel Gawrychowski. Pattern matching on grammar-compressed strings in linear time. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 2833–2846. SIAM, 2022. doi:10.1137/1.9781611977073.110.
  • [23] Moses Ganardi, Artur Jeż, and Markus Lohrey. Balancing straight-line programs. Journal of the ACM, 68(4):27:1–27:40, 2021. doi:10.1145/3457389.
  • [24] Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, and Kurt Sieber. Grammar-based compression of unranked trees. Theory of Computing Systems, 64(1):141–176, 2020. doi:10.1007/s00224-019-09942-y.
  • [25] Ferenc Gécseg and Magnus Steinby. Tree languages. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 1–68. Springer, 1997. doi:10.1007/978-3-642-59126-6_1.
  • [26] Stefan Göller and Markus Lohrey. Fixpoint logics on hierarchical structures. Theory of Computing Systems, 48(1):93–131, 2009. doi:10.1007/s00224-009-9227-1.
  • [27] Annegret Habel and Hans-Jörg Kreowski. Some structural aspects of hypergraph languages generated by hyperedge replacement. In Proceedings of the 4th Annual Symposium on Theoretical Aspects of Computer Science, STACS 1987, volume 247 of Lecture Notes in Computer Science, pages 207–219. Springer, 1987. doi:10.1007/BFB0039608.
  • [28] Wojciech Kazana and Luc Segoufin. First-order query evaluation on structures of bounded degree. Logical Methods in Computer Science, 7(2), 2011. doi:10.2168/LMCS-7(2:20)2011.
  • [29] Wojciech Kazana and Luc Segoufin. Enumeration of first-order queries on classes of structures with bounded expansion. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, pages 297–308. ACM, 2013. doi:10.1145/2463664.2463667.
  • [30] Wojciech Kazana and Luc Segoufin. Enumeration of monadic second-order queries on trees. ACM Transactions on Computational Logic, 14(4):25:1–25:12, 2013. doi:10.1145/2528928.
  • [31] Benny Kimelfeld, Wim Martens, and Matthias Niewerth. A formal language perspective on factorized representations. In Proceedings of the 28th International Conference on Database Theory, ICDT 2025, volume 328 of LIPIcs, pages 20:1–20:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPICS.ICDT.2025.20.
  • [32] Stephan Kreutzer and Anuj Dawar. Parameterized complexity of first-order logic. Electronic Colloquium on Computational Complexity, TR09-131, 2009. URL: https://eccc.weizmann.ac.il/report/2009/131.
  • [33] N. Jesper Larsson and Alistair Moffat. Off-line dictionary-based compression. Proceedings of the IEEE, 88(11):1722–1732, 2000. doi:10.1109/5.892708.
  • [34] Thomas Lengauer. Hierarchical planarity testing algorithms. Journal of the ACM, 36(3):474–509, 1989. doi:10.1145/65950.65952.
  • [35] Thomas Lengauer and Klaus W. Wagner. The correlation between the complexities of the nonhierarchical and hierarchical versions of graph problems. Journal of Computer and System Sciences, 44:63–93, 1992. doi:10.1016/0022-0000(92)90004-3.
  • [36] Thomas Lengauer and Egon Wanke. Efficient solution of connectivity problems on hierarchically defined graphs. SIAM Journal on Computing, 17(6):1063–1080, 1988. doi:10.1137/0217068.
  • [37] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
  • [38] Markus Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241–299, 2012. doi:10.1515/gcc-2012-0016.
  • [39] Markus Lohrey. Model-checking hierarchical structures. Journal of Computer and System Sciences, 78(2):461–490, 2012. doi:10.1016/J.JCSS.2011.05.006.
  • [40] Markus Lohrey. Grammar-based tree compression. In Proceedings of the 19th International Conference on Developments in Language Theory, DLT 2015, volume 9168 of Lecture Notes in Computer Science, pages 46–57. Springer, 2015. doi:10.1007/978-3-319-21500-6_3.
  • [41] Markus Lohrey, Sebastian Maneth, and Roy Mennicke. XML tree structure compression using RePair. Information Systems, 38(8):1150–1167, 2013. doi:10.1016/J.IS.2013.06.006.
  • [42] Markus Lohrey, Sebastian Maneth, and Carl Philipp Reh. Constant-time tree traversal and subtree equality check for grammar-compressed trees. Algorithmica, 80(7):2082–2105, 2018. doi:10.1007/s00453-017-0331-3.
  • [43] Markus Lohrey, Sebastian Maneth, and Markus L. Schmid. FO-query enumeration over SLP-compressed structures of bounded degree, 2025. arXiv:2506.19421.
  • [44] Markus Lohrey, Sebastian Maneth, and Manfred Schmidt-Schauß. Parameter reduction and automata evaluation for grammar-compressed trees. Journal of Computer and System Sciences, 78(5):1651–1669, 2012. doi:10.1016/j.jcss.2012.03.003.
  • [45] Markus Lohrey and Markus L. Schmid. Enumeration for MSO-queries on compressed trees. Proceedings of the ACM on Management of Data, 2(2):78, 2024. doi:10.1145/3651141.
  • [46] Sebastian Maneth and Fabian Peternek. Grammar-based graph compression. Information Systems, 76:19–45, 2018. doi:10.1016/J.IS.2018.03.002.
  • [47] Sebastian Maneth and Fabian Peternek. Constant delay traversal of grammar-compressed graphs with bounded rank. Information and Computation, 273:104520, 2020. doi:10.1016/J.IC.2020.104520.
  • [48] Madhav V. Marathe, Harry B. Hunt III, Richard Edwin Stearns, and Venkatesh Radhakrishnan. Approximation algorithms for PSPACE-hard hierarchically and periodically specified problems. SIAM Journal on Computing, 27(5):1237–1261, 1998. doi:10.1137/S0097539795285254.
  • [49] Madhav V. Marathe, Harry B. Hunt III, and S. S. Ravi. The complexity of approximation PSPACE-complete problems for hierarchical specifications. Nordic Journal of Computing, 1(3):275–316, 1994. URL: https://www.cs.helsinki.fi/njc/njc1_papers/number3/paper1.pdf.
  • [50] Madhav V. Marathe, Venkatesh Radhakrishnan, Harry B. Hunt III, and S. S. Ravi. Hierarchically specified unit disk graphs. Theoretical Computer Science, 174(1–2):23–65, 1997. doi:10.1016/S0304-3975(96)00008-4.
  • [51] Martin Muñoz and Cristian Riveros. Constant-delay enumeration for SLP-compressed documents. Logical Methods in Computer Science, 21(1), 2025. doi:10.46298/LMCS-21(1:17)2025.
  • [52] Craig G. Nevill-Manning and Ian H. Witten. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, 7:67–82, 1997. doi:10.1613/JAIR.374.
  • [53] Dan Olteanu. Factorized databases: A knowledge compilation perspective. In Beyond NP, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 12, 2016, 2016. URL: http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12638.
  • [54] Dan Olteanu and Maximilian Schleich. F: regression models over factorized views. Proceedings of the VLDB Endowment, 9(13):1573–1576, 2016. doi:10.14778/3007263.3007312.
  • [55] Dan Olteanu and Maximilian Schleich. Factorized databases. ACM SIGMOD Record, 45(2):5–16, 2016. doi:10.1145/3003665.3003667.
  • [56] Dan Olteanu and Jakub Závodný. Size bounds for factorised representations of query results. ACM Transactions on Database Systems, 40(1):2:1–2:44, 2015. doi:10.1145/2656335.
  • [57] Leonid Peshkin. Structure induction by lossless graph compression. In Proceedings of the 2007 Data Compression Conference, DCC 2007, pages 53–62. IEEE Computer Society, 2007. doi:10.1109/DCC.2007.73.
  • [58] William C. Rounds. Mappings and grammars on trees. Mathematical System Theory, 4(3):257–287, 1970. doi:10.1007/BF01695769.
  • [59] Markus L. Schmid and Nicole Schweikardt. Spanner evaluation over SLP-compressed documents. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2021, pages 153–165. ACM, 2021. doi:10.1145/3452021.3458325.
  • [60] Markus L. Schmid and Nicole Schweikardt. Query evaluation over SLP-represented document databases with complex document editing. In Proceedings of the International Conference on Management of Data, PODS 2022, pages 79–89. ACM, 2022. doi:10.1145/3517804.3524158.
  • [61] Nicole Schweikardt, Luc Segoufin, and Alexandre Vigny. Enumeration for FO queries over nowhere dense graphs. Journal of the ACM, 69(3):22:1–22:37, 2022. doi:10.1145/3517035.
  • [62] Detlef Seese. Linear time computable problems and first-order descriptions. Mathematical Structures in Computer Science, 6(6):505–526, 1996. doi:10.1017/S0960129500070079.
  • [63] Luc Segoufin. Constant delay enumeration for conjunctive queries. ACM SIGMOD Record, 44(1):10–17, 2015. doi:10.1145/2783888.2783894.
  • [64] Luc Segoufin and Alexandre Vigny. Constant delay enumeration for FO queries over databases with local bounded expansion. In Proceedings of the 20th International Conference on Database Theory, ICDT 2017, volume 68 of LIPIcs, pages 20:1–20:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.ICDT.2017.20.