Abstract 1 Introduction 2 Preliminaries and Notation 3 From Trans-Enum to Dom-Enum 4 Overview of Algorithm 5 Preprocessing for Enumeration 6 Correctness and Analysis of The Enumeration Algorithm 7 Conclusion References

Enumeration of Minimal Hitting Sets Parameterized by Treewidth

Batya Kenig ORCID Technion, Israel Institute of Technology, Haifa, Israel Dan Shlomo Mizrahi Technion, Israel Institute of Technology, Haifa, Israel
Abstract

Enumerating the minimal hitting sets of a hypergraph is a problem which arises in many data management applications that include constraint mining, discovering unique column combinations, and enumerating database repairs. Previously, Eiter et al. [22] showed that the minimal hitting sets of an n-vertex hypergraph, with treewidth w, can be enumerated with delay O(nw) (ignoring polynomial factors), with space requirements that scale with the output size. We improve this to fixed-parameter-linear delay, following an FPT preprocessing phase. The memory consumption of our algorithm is exponential with respect to the treewidth of the hypergraph.

Keywords and phrases:
Enumeration, Hitting sets
Copyright and License:
[Uncaptioned image] © Batya Kenig and Dan Shlomo Mizrahi; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Enumeration
Related Version:
Full Version: https://doi.org/10.48550/arXiv.2408.15776 [39]
Funding:
This work was supported by the US-Israel Binational Science Foundation (NSF-BSF) Grant No. 2020751.
Editors:
Sudeepa Roy and Ahmet Kara

1 Introduction

A hypergraph is a finite collection () of sets called hyperedges over a finite set V() of nodes. A transversal (or hitting set) of is a set 𝒮V() that has a non-empty intersection with every hyperedge of . A transversal 𝒮 is minimal if no proper subset of 𝒮 is a transversal of . The problem of enumerating the set of minimal transversals of a hypergraph (called Trans-Enum for short) is a fundamental combinatorial problem with a wide range of applications in Boolean algebra [23], computational biology [57], drug-discovery [56], and many more (see surveys [21, 29, 20]).

The role of Trans-Enum in data management applications can be traced back to the 1987 paper by Mannila and Räihä [50]. This stems from the fact that finding minimal or maximal solutions (with respect to some property) is an essential task in many data management applications. In many cases, such tasks are easily translated, or even equivalent, to Trans-Enum. These include, for example, the discovery of inclusion-wise minimal Unique Column Combinations (UCCs), which are natural candidates for primary keys [8, 9], and play an important role in data profiling and query optimization (see surveys by Papenbrock, Naumann, and coauthors [1][45]). Eiter and Gottlob have proved that UCC discovery and Trans-Enum are, in fact, equivalent [20]. Other data profiling tasks that can be reduced to trans-enum include discovering maximal frequent itemsets [33], functional dependencies [50, 59, 46], and multi-valued dependencies [40, 41]. Another data-management task that directly translates to Trans-Enum is that of enumerating database repairs, which are maximal sets of tuples that exclude some combinations of tuples based on logical patterns (e.g., functional dependencies, denial constraints etc.) [12, 7, 42, 48]. The forbidden combinations of tuples are represented as the hyperedges in a conflict hypergraph [11, 12], and the problem of enumerating the repairs is reduced to that of enumerating the minimal transversals of the conflict hypergraph [48].

An edge cover of a hypergraph is a subset of hyperedges 𝒟() such that every vertex vV() is included in at least one hyperedge in 𝒟. An edge cover 𝒟 is minimal if no proper subset of 𝒟 is an edge cover. Minimal edge covers are used to generate covers of query results, which are succinct, lossless representations of query results [35]. Our results for trans-enum carry over to the problem of enumerating minimal edge-covers (cover-enum).

A hypergraph can have exponentially many minimal hitting sets, ruling out any polynomial-time algorithm. Hence, the yardstick used to measure the efficiency of an algorithm for Trans-Enum is the delay between the output of successive solutions. An enumeration algorithm runs in polynomial delay if the delay is polynomial in the size of the input. Weaker notions are incremental polynomial delay if the delay between the ith and the (i+1)st solutions is bounded by a polynomial of the size of the input plus i, and polynomial total time if the total computation time is polynomial in the sizes of both the input and output. While there exists a quasi-polynomial total time algorithm [27], whether Trans-Enum has a polynomial total time algorithm is a major open question [23]. In the absence of a tractable algorithm for general inputs, previous work has focused on special classes of hypergraphs. In particular, Trans-Enum admits a polynomial total time algorithm when restricted to acyclic hypergraphs, and hypergraphs with bounded hyperedge size [10].

In this paper, we consider the trans-enum problem parameterized by the treewidth of the input hypergraph. The treewidth of a graph is an important graph complexity measure that occurs as a parameter in many Fixed Parameter Tractable (FPT) algorithms [26]. An FPT algorithm for a parameterized problem solves the problem in time O(f(k)nc), where n is the size of the input, f(k) is a computable function that depends only on the parameter k, and c is a fixed constant. The treewidth of a hypergraph is defined to be the treewidth of its associated node-hyperedge incidence graph [22] (formal definitions in Section 2). Our main result is that for a hypergraph , with n vertices, m hyperedges, and treewidth w, we can enumerate the minimal hitting sets of with fixed-parameter-linear-delay O((n+m)w), following an FPT preprocessing-phase that takes time O((n+m)k5k), where wk2w. The memory requirement of our algorithm is in O((n+m)5k).

Previously, Eiter et al. [22] showed that the minimal transversals of an n-vertex hypergraph with treewidth w can be enumerated with delay O(nw+1), where is the size of the hypergraph (formal definition in Section 2), and w its treewidth. When parameterized by the treewidth of the input hypergraph, our result provides a significant improvement: the minimal transversals can be enumerated in fixed-parameter-linear delay O(w), following an FPT preprocessing phase. Ours also improves on the memory consumption which, for the algorithm of Eiter et al. [22], scales with the output size, while ours is polynomial for bounded-treewidth hypergraphs.

Overview of approach, and new techniques

We build on the well known relationship between the problem of enumerating minimal hypergraph transversals (trans-enum) and enumerating minimal dominating sets in graphs. A subset of vertices SV(G) is a dominating set of a graph G if every vertex in V(G)S has a neighbor in S. A dominating set is minimal if no proper subset of S is a dominating set of G. We denote by Dom-Enum the problem of enumerating the minimal dominating sets of G. There is a tight connection between Dom-Enum and trans-enum; Kanté et al. [34] showed that Dom-Enum can be solved in polynomial-total-time if and only if trans-enum can as well. Building on this result, we show in Section 3 that trans-enum over a hypergraph with treewidth w can be reduced to Dom-Enum over a graph with treewidth w{w,w+1}. Therefore, we focus on an algorithm for enumerating minimal dominating sets, where the delay is parameterized by the treewidth. In that vein, we mention the recent work by Rote [54], that presented an algorithm for enumerating the minimal dominating sets of trees.

The enumeration algorithm has two phases: preprocessing and enumeration. In the preprocessing phase, we construct a data structure that allows us to efficiently determine whether graph G has a minimal dominating set S meeting specified constraints. During enumeration, the algorithm queries this structure repeatedly, backtracking on a negative answer. This method of enumeration is known as Backtrack Search [52]. Our algorithm leverages the treewidth parameter w in two ways: constructing the data structure in fixed-parameter-tractable time and space dependent on w, and querying it such that each query answers in time O(w). Overall, for an n-vertex graph, this yields a delay of O(nw).

Central to our approach is a novel data structure based on the disjoint branch tree decomposition. Graphs with such a tree decomposition were first characterized by Gavril, and are called rooted directed path graphs [30]. The class of hypergraphs that have a disjoint branch tree decomposition strictly include the class of γ-acyclic hypergraphs, and are strictly included in the class of β-acyclic hypergraphs [19]. Hypergraphs that have a disjoint-branch tree decomposition have proved beneficial for both schema design in relational databases [24, 19], and query processing in probabilistic databases [6, 37, 36, 38]. To our knowledge, ours is the first to make use of this structure for enumeration. Dynamic programming over tree decompositions breaks down complex problems into smaller subproblems associated with the nodes (or bags) of the tree decomposition. These subproblems are solved independently and combined in a bottom-up manner according to the tree structure. The challenge lies in merging solutions of multiple subtrees at each node, which can be complex and computationally intensive. Disjoint-branch tree decompositions simplify this by ensuring each node’s children have disjoint vertex sets, easing the enforcement of specific constraints placed on the vertices. This is especially important in the case of Dom-Enum where the constraints imposed on the vertices have the form of “vertex v has at least two neighbors in the solution set”. Intuitively, it is easier and more efficient to ensure such constraints hold when the structure partitions v’s neighbors into disjoint sets. For the detailed application of this property for Dom-Enum see Section 5.2 and Example 5.2.

Our enumeration algorithm does not assume that the (incidence graph) of the input hypergraph has a disjoint branch tree decomposition [19]. Therefore, as part of the preprocessing phase, we construct a disjoint branch tree decomposition directly from the tree decomposition of the input graph. This new structure contains “vertices” that are not in the original input graph, and whose sole purpose is to facilitate the enumeration. In Section 5.2 we show that given a tree decomposition (that is not disjoint-branch) whose width is w, we can construct the required disjoint-branch data-structure in time O(nw2), with the guarantee that the width of the resulting structure is, effectively, at most 2w. The main results of this paper are the following.

Theorem 1.

Let G be an n-vertex graph whose treewidth is w. Following a preprocessing phase that takes time O(nw82w), the minimal dominating sets of G can be enumerated with delay O(nw) and total space O(n82w).

The translation from the Trans-Enum (cover-enum) problem in the hypergraph to the Dom-Enum problem in the bipartite incidence-graph of (see Section 3) imposes certain restrictions on the set of vertices that can belong to the solution set (i.e., the minimal transversal or edge-cover). As a corollary of Theorem 1, we show the following (in Section 5.3).

Corollary 2.

Let be a hypergraph with n vertices, m hyperedges, and treewidth w. Following a preprocessing phase that takes time O((m+n)w52w), the minimal transversals (edge-covers) of can be enumerated with delay O((m+n)w) and total space O((m+n)52w).

Relation to the meta-theorem of Courcelle [13].

The seminal meta-theorem by Courcelle [13, 3] states that every decision and optimization graph problem definable in Monadic Second Order Logic (MSO) is FPT (and even linear in the input size) when parameterized by the treewidth of the input structure. MSO is the extension of First Order logic that allows quantification (i.e., ,) over sets. Bagan [4], Courcelle [14], and Amarilli et al. [2] extended this result to enumeration problems, and showed that the delay between successive solutions to problems defined in MSO logic is fixed-parameter linear, when parameterized by the treewidth of the input structure.

Courcelle-based algorithms for model-checking, counting, and enumerating the assignments satisfying an MSO formula φ have the following form. Given a TD (𝒯,χ) of a graph G, it is first converted to a labeled binary tree T over a fixed alphabet that depends on the width of the TD. The binary tree T has the property that Gφ if and only if Tφ, where φ is derived from φ (e.g., by the algorithm in [25]). Next, φ is converted to a finite, deterministic tree automaton (FTA) using standard techniques [55, 17]. The generated FTA accepts the binary tree T if and only if Tφ. Since simulating the deterministic FTA on T takes linear time, this proves that Courcelle-based algorithms run in time O(nf(φ,w)), where n=|V(G)|, w is the width of the TD (𝒯,χ), and φ is the length of φ.

The main problem with Courcelle-based algorithms is that the function f(φ,w) is an iterated exponential of height O(φ). The problem stems from the fact that every quantifier alternation in the MSO formula (i.e., and ) induces a power-set construction during the generation of the FTA. Therefore, an MSO formula with k quantifier alternations, will result in a deterministic FTA whose size is a k-times iterated exponential. Various lower-bound results have shown that this blowup is unavoidable [53] even for model-checking over trees. According to Gottlob, Pichler and Wei [31], the problem is even more severe on bounded treewidth structures because the original (possibly simple) MSO-formula φ is first transformed into an equivalent MSO-formula φ over trees, which is more complex than the original formula φ, and in general will have additional quantifier alternations.

MSO-logic is highly expressive, and can express complex graph properties. The condition that XV(G) is a minimal dominating set in an undirected graph G can be specified using MSO logic. You can see the exact formulation (based on Proposition 4) in the full version [39]. The MSO formula for a minimal dominating set has two quantifier alternations. By the previous, the size of an FTA representing this formula will be at least doubly exponential. Previous analysis and experiments that attempted to craft the FTA directly (i.e., without going through the generic MSO-to-FTA conversion) have shown that this blowup materializes even for very simple MSO formulas [47, 44, 31, 58]. Moreover, Frick and Grohe [28] prove that no general algorithm can perform better than the FTA-based approach.

Despite evidence pointing to the contrary, it may be the case that with some considerable effort our algorithm can be formulated using an FTA with comparable runtime guarantees. The only instance we are aware of where a generic approach achieves running times comparable to those of a specialized algorithm is the algorithm of Gottlob, Pichler, and Wei [32] that translates an MSO formula to a monadic Datalog program (not an FTA). However, there too, the monadic Datalog programs were crafted manually, and used to directly encode the dynamic programming over the TD [32, 47].

Because of these limitations, practical algorithms over bounded-treewidth structures often avoid the MSO-to-FTA conversion and directly encode dynamic programming strategies on the tree decomposition [31, 49]. These specialized algorithms are not only more efficient, but are also transparent, allowing for exact runtime analysis, and enable proving optimality guarantees by optimizing the dependence on treewidth [49, 5, 18]. See the full version of this paper for more details [39].

Organization.

Following preliminaries in Section 2, we show in Section 3 how trans-enum and cover-enum over a hypergraph with treewidth w can be reduced to dom-enum over a graph with treewidth ww+1. In Section 4 we present an overview of the algorithm. In Section 5, we describe the FPT preprocessing-phase, and in Section 6 prove correctness, and establish the FPL-delay guarantee of the enumeration algorithm. We conclude in Section 7. Due to space constraints, we defer some of the technical details and proofs to the full version of this paper [39], which also contains a discussion of lower bounds, and other related work.

2 Preliminaries and Notation

Let G be an undirected graph with vertices V(G) and edges E(G), where n=|V(G)|, and m=|E(G)|. We assume that G does not have self-loops. Let vV(G). We let NG(v)=def{uV(G):(u,v)E(G)} denote the neighborhood of v, dropping the subscript G when it is clear from the context. We denote by G[T] the subgraph of G induced by a subset of vertices TV(G). Formally, V(G[T])=T, and E(G[T])={(u,v)E(G):{u,v}T}.

Let UV(G), and let F be a finite set of labels. We denote by φ:UF an assignment of labels to the vertices U. For example, let U={v1,v2}, F={α,β}, and φ be an assignment where φ(v1)=α,φ(v2)=α. If tV(G)U, we denote by φ=def(φ,tβ) the assignment that results from φ by extending it with the assignment of the label β to vertex t. That is, φ(v)=φ(v) for every vU, and φ(t)=β. Finally, for UU, we denote by φ[U]:UF the assignment where φ[U](w)=φ(w) for all wU.

Definition 3.

A subset of vertices DV(G) is a dominating set of G if every vertex vV(G)D has a neighbor in D. A dominating set D is minimal if D is no longer a dominating set for every DD. We denote by 𝒮(G) the minimal dominating sets of G.

A hypergraph is a pair (V(),()), where V(), called the vertices of , is a finite set and ()2V(){} are called the hyperedges of . The dual of a hypergraph , denoted d, is the hypergraph with vertices V(d)=def{ye:e()}, and hyperedges (d)=def{fv:vV()}, where fv=def{yeV(d):ve}.

Definition 4 ([22](Incidence Graph)).

The incidence graph of a hypergraph , denoted I(), is the undirected, bipartite graph with vertex set V(I())=defV(){ye:e()} and edge set E(I())=def{(x,ye):xV(),e(), and xe}.

Observe that a hypergraph and its dual d have the same incidence graph. The size of a hypergraph , denoted , is |V()|+e()|e|. A transversal of a hypergraph is a subset V() that intersects every hyperedge e(). That is, e for every e(). It is a minimal transversal if it does not contain any transversal as a proper subset. It is well known that a transversal V() is minimal if and only if for every u, there exists a hyperedge eu(), such that eu={u} [51]. An edge cover of a hypergraph is a subset 𝒟() such that e𝒟e=V(). It is a minimal edge cover if it does not contain any edge cover as a proper subset. The following shows that trans-enum and cover-enum are equivalent.

Proposition 5.

A subset 𝒟() is a minimal edge cover of if and only if the set {ye:e𝒟} is a minimal transversal of d.

Definition 6 (Tree Decomposition (TD)).

A Tree Decomposition (TD) of a graph G is a pair (𝒯,χ) where 𝒯 is an undirected tree with nodes V(𝒯), edges E(𝒯), and χ is a function that maps each uV(𝒯) to a set of vertices χ(u)V(G), called a bag, such that:

  1. 1.

    uV(𝒯)χ(u)=V(G).

  2. 2.

    For every (s,t)E(G) there is a node uV(𝒯) such that {s,t}χ(u).

  3. 3.

    For every vV(G) the set {uV(𝒯)vχ(u)} is connected in 𝒯; this is called the running intersection property.

The width of (𝒯,χ) is the size of the largest bag minus one (i.e., max{|χ(u)|:uV(𝒯)}1). The tree-width of G, denoted tw(G), is the minimal width of a TD for G.

There is more than one way to define the treewidth of a hypergraph [22]. In this work, the treewidth of a hypergraph is defined to be the treewidth of its incidence graph I() [22].

Let (𝒯,χ) be a TD of a graph G. Root the tree at node rV(𝒯) by directing the edges E(𝒯) from r to the leaves. We say that such a TD is rooted; in notation (𝒯r,χ). For a node uV(𝒯r), we denote by pa(u) its parent in 𝒯r (where pa(u)=defnil if u is the root node), by 𝒯u the subtree of 𝒯r rooted at node u, by Vu=deftV(𝒯u)χ(t), and by Gu the graph induced by Vu (i.e., Gu=defG[Vu]). By this definition, Gr=G. We will use “vertices” to refer to the vertices of the graph G, and “nodes” to refer to the vertices of the TD.

Definition 7 (Disjoint-Branch TD (DBTD) [19, 30]).

A rooted TD (𝒯r,χ) is disjoint branch if, for every uV(𝒯r), with children u1,,uk, it holds that χ(ui)χ(uj)= for all ij. A TD (𝒯,χ) is disjoint branch if it has a node rV(T) such that (𝒯r,χ) is disjoint branch.

First characterized by Gavril [30], graphs that have a disjoint branch TD are called rooted directed path graphs. In [15], disjoint branch TDs are called special tree decompositions. For the purpose of enumeration, our algorithm constructs a disjoint-branch TD whose width is at most twice the treewidth of the input graph G. It is common practice to formulate dynamic programming algorithms over TDs that are nice [43].

Definition 8 (Nice Tree Decomposition [43, 16]).

A nice TD is a rooted TD (𝒯,χ) with root node r, in which each node uV(𝒯) is one of the following types:

  • Leaf node: a leaf of 𝒯 where χ(u)=.

  • Introduce node: has one child node u where χ(u)=χ(u){v}, and vχ(u).

  • Forget node: has one child node u where χ(u)=χ(u){v}, and vχ(u).

  • Join node: has two child nodes u1, u2 where χ(u)=χ(u1)=χ(u2).

Given a TD (Definition 2) of a graph G of width w, one can in time O(w2|V(G)|) compute a nice tree decomposition of G of width w that has at most O(w|V(G)|) nodes [16].

3 From Trans-Enum to Dom-Enum

In this section, we show how the trans-enum problem over a hypergraph with treewidth w is translated to the dom-enum problem over a graph G with treewidth ww+1. Due to the equivalence between trans-enum and cover-enum (Proposition 2), the translation also carries over to the cover-enum problem. We associate with the hypergraph a tripartite graph B() that is obtained from the incidence graph I() (Definition 2) by adding a new vertex that is made adjacent to all vertices in V(). Formally, the nodes of B() are defined V(B())=defV(I()){v} where vV(I()), and E(B())=defE(I()){(u,v):uV()}. Note that for every vertex in {ye:e()}, it holds that NB()(ye)=e, NB()(v)=V(), and vNB()(ye), for every e(). The construction of B() takes time O(), and is clearly polynomial.

Theorem 9.

A set V() is a minimal transversal of if and only if {v} is a minimal dominating set of B().

If =V() is a minimal transversal of , then it is the unique minimal transversal of . Consequently, V() is the unique minimal dominating set of B() that is contained in V(){v}. Hence, the problem of enumerating the minimal transversals of is reduced to that of enumerating the minimal dominating sets of B() that exclude the vertex-set {ye:e()}, and include the vertex v. Symmetrically, the problem of enumerating the minimal edge covers of is reduced to enumerating the minimal dominating sets of the tripartite graph C() that results from I() by adding a new vertex vV(()) as a neighbor to all vertices {ye:e()}, and enumerating the minimal dominating sets of C() that exclude the vertex-set {u:uV()}, and include the vertex v. In Section 5.3, we show how these restrictions are encoded.

Suppose that is a hypergraph with treewidth k. That is, the treewidth of the incidence graph tw(I())=k [22]. Next, we show that the treewidth of B() and C() is at most k+1. This claim is directly implied by the following lemma.

Lemma 10.

Let (𝒯,χ) be a tree-decomposition for the graph G, and let UV(G). Let G be the graph that results from G by adding a vertex vV(G) as a neighbor to all vertices U. That is, V(G)=V(G){v} and E(G)=E(G){(v,w):wU}. If the width of (𝒯,χ) is k, then G has a tree-decomposition whose width is kk+1.

The reduction from Trans-Enum to Dom-Enum of Kanté et al. [34] does not preserve the bounded-treewidth property, and generates a graph whose treewidth is O(n), regardless of the treewidth of the hypergraph tw(I()). Our reduction guarantees that the increase to the treewidth is at most one. That is, in our reduction tw(B())tw(I())+1 (and tw(C())tw(I())+1), hence preserving the bounded treewidth property.

4 Overview of Algorithm

In this section, we provide a high-level view of the algorithm for enumerating the minimal dominating sets of a graph G, which applies the following simple characterization.

Proposition 11.

A dominating set DV(G) is minimal if and only if for every uD, one of the following holds: (i) N(u)D=, or (ii) there exists a vertex vN(u)D, such that N(v)D={u}.

Definition 12 ([51](Private Neighbor)).

Let DV(G). A vertex vV(G)D is a private neighbor of D if there exists a vertex uD such that N(v)D={u}. In this case we say that v is u’s private neighbor. By private neighbors of D we refer to the vertices in V(G)D that are private neighbors of D.

Another way to state Proposition 4 is to say that D is a minimal dominating set if and only if D is dominating, and for every uD that does not have a private neighbor, it holds that N(u)D=. In other words, if D𝒮(G), then every vertex uD either has a private neighbor vV(G)D where N(v)D={u}, or N(u)D=. Therefore, the minimal dominating set D𝒮(G) partitions the vertices V(G) into four categories (or labels):

  1. 1.

    Vertices in D that have a private neighbor in V(G)D are labeled [1]σ.

  2. 2.

    Vertices vD that have no private neighbor in V(G)D, (N(v)D=), are labeled σI.

  3. 3.

    Vertices vV(G)D that are private neighbors of D, (|N(v)D|=1), are labeled [1]ω.

  4. 4.

    All vV(G)D that are not private neighbors of D, (|N(v)D|2), are labeled [2]ρ.

It is easy to see that distinct minimal dominating sets induce distinct labelings to the vertices of G. Therefore, we view D as assigning labels to vertices according to their category.

Example 13.

Consider the graph G in Figure 3(a), and the minimal dominating set D={d,f,i}. Vertices a,b,c,e, and h have exactly one neighbor in D, and hence are assigned the label [1]ω; since N(g)D={f,i}, then g is assigned the label [2]ρ. The private neighbors of d are {b,c,e}, and the private neighbors of f are {a,h}. Therefore, vertices d and f are assigned label [1]σ. Finally, i is assigned label σI because it has no private neighbors (i.e., |N(g)D|2), and N(i)D=.

Next, we refine our labeling as follows. Let V(G)={v1,,vn}, where the indices represent a complete ordering of V(G) (in Section 4.1 we discuss this ordering and its properties). We let Vi=def{v1,,vi} denote the first in vertices in the ordering.

Definition 14 (Labels Induced by minimal dominating set).

Let D𝒮(G) be a minimal dominating set. We define the labels induced by D on Vi=def{v1,,vi} as follows.

  1. 1.

    Vertices in DVi that have a private neighbor in V(G)D. Vertex vVi is labeled [0]σ if and only if vD, v has a private neighbor in ViD, and no private neighbors in V(G)Vi; it is labeled [1]σ if and only if vD, and it has a private neighbor in V(G)Vi.

  2. 2.

    A vertex vDVi is labeled σI if and only if v has no private neighbors in V(G)D, and N(v)D=.

  3. 3.

    Vertices vViD that are private neighbors of D, and hence |N(v)D|=1. Vertex vViD is labeled [0]ω if and only if v has a single neighbor in ViD. Otherwise, v has a single neighbor in (V(G)Vi)D, and is labeled [1]ω.

  4. 4.

    Vertices vViD where |N(v)D|2. Let j=defmin{2,|N(v)ViD|}. Then v has at least 2j neighbors in D(V(G)Vi), and is labeled [2j]ρ.

Overall, we have eight labels that we partition to three groups: Fσ=def{[0]σ,[1]σ,σI}, Fω=def{[0]ω,[1]ω}, and Fρ=def{[0]ρ,[1]ρ,[2]ρ}. We let F=defFσFωFρ.

Definition 15 (Minimal Dominating set consistent with assignment; extendable assignment).

Let θi:ViF denote a labeling to Vi. A minimal dominating set D𝒮(G) is consistent with θi if, for every vVi, the label that D induces on v is θi(v) (Definition 4). We say that the labeling θi is extendable if there exists a minimal dominating set D𝒮(G) that is consistent with θi.

Example 16.

Consider the path v1v2v3. The labeling θ1:{v1σI} is extendable because the minimal dominating set D1={v1,v3} is consistent with θ1; vertex v1D1, v1 does not have any private neighbors in V(G)D1={v2} (because v2 has two neighbors in D1), and N(v1)D1=. On the other hand, the assignment θ2:{v1[0]σ,v2[0]ω} is not extendable because the only minimal dominating set that contains v1, and excludes v2, must contain v3, thus violating the constraint that v2 have no dominating neighbors among V(G)V2={v3}, or a single neighbor in the minimal dominating set (item (3) of Definition 4). Also, θ1:{v1[1]σ} is not extendable for the same reason; v1 must contain a private neighbor, which can only be v2. If v2 is not part of the solution, then v3 must be included in it. But then, v2 is no longer a private neighbor because both its neighbors (i.e., v1 and v3) are dominating, thus violating the constraint θ1(v1)=[1]σ.

We denote by θ0 the empty assignment, which is vacuously extendable. For every i[1,n], we denote by 𝚯i the set of extendable assignments θi:{v1,,vi}F. Formally:

𝚯i=def{θi:{v1,,vi}F:θi is extendable} i[0,n] (1)

Let θi:ViF be an assignment. For brevity, we denote by Vσ(θi), and Vω(θi) the vertices in Vi that are assigned a label in Fσ and Fω, respectively. Formally, Vσ(θi)=def{vVi:θi(v)Fσ}, Vω(θi)=def{vVi:θi(v)Fω}, and Vρ(θi)=def{vVi:θi(v)Fρ}.

Lemma 17.

The following holds: 𝒮(G)={Vσ(θn):θn𝚯n}.

Proof.

By bidirectional inclusion. Let θn𝚯n be an extendable labeling. Then there exists a D𝒮(G) that is consistent with θn. By Definition 4 (items (1) and (2)), this means that DVσ(θn). If vVσ(θn), then by the consistency of D, it holds that vD. Therefore, DVσ(θn), and thus D=Vσ(θn). Hence, Vσ(θn)𝒮(G).

Now, let D𝒮(G). We build the labeling θn:V(G)F as follows. For every vV(G)D, such that |N(v)D|2 we assign the label [0]ρ. For every vV(G)D, such that |N(v)D|=1 we assign the label [0]ω. Since D is dominating, then the the remaining (unlabeled) vertices must belong to D. We assign the label [0]σ to every vD that has a private neighbor in V(G)D (i.e., v has a neighbor assigned [0]ω). By Proposition 4, the remaining vertices in D do not have any neighbors in D, and hence are assigned σI. By construction, D is consistent with θn, and hence θn𝚯n. Also by construction, Vσ(θn)=D. Therefore, D{Vσ(θn):θn𝚯n}. According to Lemma 4, we describe an algorithm for enumerating the set {Vσ(θn):θn𝚯n}. We fix a complete order Q=v1,v2,,vn over V(G). In the preprocessing phase, the algorithm constructs a data structure that receives as input a labeling θi:ViF, and in time O(tw(G)) returns 𝚝𝚛𝚞𝚎 if and only if θi is extendable. Here, Vi={v1,,vi} are the first i vertices of Q.

The outline of the enumeration algorithm is presented in Figure 1 (ignore the algorithm in Figure 2 for now). It is initially called with the empty labeling θ0, which is vacuously extendable, and the first vertex in the order v1. If i=n+1, then the set Vσ(θn) is printed in line 2. The algorithm then iterates over all eight labels in F, and for each label ciF attempts to increment the extendable labeling θi1:Vi1F, to one or more labelings θi:ViF, where θi(vi)=ci. Assigning label ci to vi triggers an update to the labels of vi’s neighbors in Vi1 (i.e., N(vi)Vi). The details of this update are specified in the procedure IncrementLabeling described in Section 4.2. The enumeration algorithm then utilizes the data structure , built in the preprocessing phase. In line 6, the algorithm queries the data structure , whether the resulting labeling θi is extendable. If and only if this is the case, does the algorithm recurs with the pair (vi+1,θi) in line 7. Since the algorithm is initially called with the empty labeling, which is extendable, it follows that it only receives as input pairs (θi1,vi) where θi1𝚯i1 is extendable. From this observation, we prove the following in the full version of this paper [39].

Theorem 18.

For all i{1,,n+1}, EnumDS is called with the pair (θi1,vi) if and only if θi1𝚯i1 is extendable.

A corollary of Theorem 18 is that EnumDS is called with the pair (θn,vn+1) if and only if θn𝚯n. From Lemma 4, we get that EnumDS prints DV(G) if and only if D𝒮(G).

Proposition 19.

Let Q=v1,,vn be a complete order over V(G) such that the following holds for every i{1,,n}:

  1. 1.

    IncrementLabeling runs in time O(tw(G)) for every pair (θi1,vi) where θi1:Vi1F, and returns a constant number of labelings θi:ViF.

  2. 2.

    IsExtendable runs in time O(tw(G)).

Then the delay between the output of consecutive items in algorithm EnumDS is O(ntw(G)).

In the following, we describe an order Q=v1,,vn, and a data structure that meet the conditions of Proposition 4. We also show how can be built in time O(n|F|2tw(G)).

Figure 1: Algorithm for enumerating minimal dominating sets in FPL-delay when parameterized by treewidth.
Figure 2: Returns true if and only if θi𝚯i is extendable.

4.1 Ordering Vertices for Enumeration

Let (𝒯,χ) be a TD rooted at node u1. Define P=defu1,,u|V(𝒯)| to be a depth first order of V(𝒯). For every i>1, pa(ui) is a node ujV(𝒯), with j<i closer to the root u1. Define B:V(G){1,,|V(𝒯)|} to map every vV(G) to the earliest bag uV(𝒯) in the order P, such that vχ(u). By the running intersection property of TDs, the mapping B:V(G){1,,|V(𝒯)|} is well defined; every vertex vV(G) is assigned a single node in V(𝒯). Let Q=v1,,vn be a complete order of V(G) that is consistent with B, such that if B(vi)<B(vj) then viQvj. There can be many such complete orders consistent with B, and we choose one arbitrarily. With some abuse of notation, we denote by B(vi) both the identifier of the node in the TD, and the bag χ(B(vi)). The enumeration algorithm EnumDS in Figure 1 follows the order Q.

Example 20.

Consider G in Figure 3(a), and its nice TD in Figure 3(b). Then B(c)=u, and agbcde in the order Q. Also, see the mapping B applied to the vertices of G.

Lemma 4.1 establishes an important property of the order Q, that is crucial for obtaining a delay of O(nw), where w is the width of the TD. Proof is deferred to [39].

Lemma 21.

Let vi be the ith vertex in Q=v1,,vn. Then N(vi){v1,,vi1}B(vi).

4.2 The IncrementLabeling Procedure

Let θi:ViF be an assignment to the first i vertices of V(G) according to the complete order Q over V(G) defined in Section 4.1. Recall that Vσ(θi), Vω(θi), and Vρ(θi) are the vertices in Vi that are assigned a label in Fσ, Fω, and Fρ by θi, respectively. Likewise, for every aF, we denote by Va(θi) the vertices in Vi that are assigned label a in θi (e.g., VσI(θi) and V[0]ω(θi) are the vertices in Vi assigned labels σI and [0]ω respectively in θi). The following proposition follows almost immediately from Definition 4, and the definition of an extendable assignment (Definition 4).

Proposition 22.

Let θi:ViF be an extendable assignment. For every vjVi it holds:

  1. 1.

    If θi(vj)=σI, then θi(w)Fρ for every wN(vj)Vi.

  2. 2.

    If θi(vj)=[x]ω where x{0,1}, then |N(vj)VσI(θi)|=0, and |N(vj)Vσ(θi)|=1x.

  3. 3.

    If θi(vj)=[1]σ, then |N(vj)VσI(θi)|=0, and |N(vj)V[1]ω(θi)|=0.

  4. 4.

    If θi(vj)=[0]σ, then |N(vj)VσI(θi)|=0, |N(vj)V[1]ω(θi)|=0, and |N(vj)V[0]ω(θi)|1.

  5. 5.

    If θi(vj)=[x]ρ where x{1,2}, then |N(vj)Vσ(θi)|=2x; and if θi(vj)=[0]ρ, then |N(vj)Vσ(θi)|2.

IncrementLabeling receives as input a labeling θi1:Vi1F, which we assume to be extendable (see EnumDS in Figure 1), and a label ciF. It generates at most two new assignments θi:ViF, where (1) θi(w)=θi1(w) for all wN(vi), and (2) For all wN(vi)Vi it holds that θi1(w)Va(θi1) if and only if θi(w)Va(θi) where a{σI,σ,ω,ρ}, and (3) θi(vi)=ci. The procedure updates the labels of vertices wN(vi)Vi based on the value ci, so that the conditions of Proposition 4.2 are maintained in θi. If the conditions of Proposition 4.2 cannot be maintained following the assignment of label ci to vi, then IncrementLabeling returns an empty set, indicating that θi1 cannot be incremented with the assignment θi(vi)ci while maintaining its extendability.

The pseudocode of IncrementLabeling is deferred to the full version of this paper [39]. In what follows, we illustrate with an example. If ciFσ, and wN(vi)Vi where θi1(w)=[1]ρ, then w’s label is updated to θi(w)=[0]ρ (item (5) of Proposition 4.2). If ci=[0]σ, and θi1(w)=[1]ω, then w’s label is updated to θi(w)=[0]ω (item (2) of Proposition 4.2). On the other hand, if θi1(w)=[0]ω, then by item (2) of Proposition 4.2, it means that |N(w)Vσ(θi1)|=1. Therefore, if ciFσ, then |N(w)Vσ(θi)|=2, thus violating item (2) of Proposition 4.2 with respect to θi. In this case, the procedure will return an empty set, indicating that θi1 cannot be augmented with vici, while maintaining the conditions of Proposition 4.2, which are required for the labeling to be extendable (Definition 4).

Lemma 23.

Procedure IncrementLabeling receives as input an extendable assignment θi1:Vi1F to the first i1 vertices of V(G) according to the complete order Q, and a value ciF. There exist at most two extendable assignments θi:ViF where (1) θi(w)=θi1(w) for all wN(vi), and (2) For all wN(vi)Vi it holds that θi1(w)Va(θi1) if and only if θi(w)Va(θi) where a{σI,σ,ω,ρ}, and (3) θi(vi)=ci, that will be returned by IncrementLabeling. If no such extendable assignment exists, the procedure will return the empty set.

Lemma 24.

The runtime of procedure IncrementLabeling with input (θi1,ci) where vjQvi for all j{1,,i1} is O(tw(G)).

The proofs of Lemmas 4.2 and 4.2 are deferred to the full version of this paper [39]. Proposition 4 defines two conditions sufficient for obtaining FPL-delay of O(ntw(G)). In Section 4.1 we introduced the ordering Q=v1,,vn, which guarantees that the runtime of IncrementLabeling is in O(tw(G)) (Lemma 4.2), thus meeting the first condition of Proposition 4. We now proceed to the second.

((a)) G,Gu.
((b)) Disregarding the dotted edges, this is a nice TD (𝒯,χ) for G. The nice disjoint branch TD for G results from adding the dotted edges, and removing nodes t0, t1, and their adjacent edges. The subtree 𝒯u corresponds to the induced graph Gu=defG[Vu], where Vu={a,b,c,d,e}. An order Q corresponding to this nice TD is Q=h,f,a,g,i,b,c,d,e.
((c)) Trie for factor u:Fχ(u){0,1} for node u in the TD in Figure 3(b). The order used to construct the trie is abc which is compatible with the order Q derived from the TD in Figure 3(b).
Figure 3: A graph G, its nice TD, and example of a trie representing a factor of the TD.

5 Preprocessing for Enumeration

The data structure generated in the preprocessing-phase is a nice disjoint-branch tree-decomposition.

Definition 25 (Nice Disjoint-Branch Tree Decomposition (Nice DBTD)).

A nice DBTD is a rooted TD (𝒯,χ) with root node r, in which each node uV(𝒯) is one of the following types:

  • Leaf node: a leaf of 𝒯 where χ(u)=.

  • Introduce node: has one child node u where χ(u)=χ(u){v}, and vχ(u).

  • Forget node: has one child node u where χ(u)=χ(u){v}, and vχ(u).

  • Disjoint Join node: has two child nodes u1, u2 where χ(u)=χ(u1)χ(u2) and χ(u1)χ(u2)=.

Figure 3(b) presents an example of a nice DBTD. The nodes of the generated nice DBTD are associated with factor tables (described in Section 5.1). Every tuple in the factor table associated with node uV(𝒯) is an assignment φu:χ(u)F to the vertices of χ(u), where F is the set of labels described in Section 4 (see Definition 4).

Since the input graph G is given with a nice TD that is not necessarily disjoint-branch, we describe, in Section 5.2, how to transform its nice TD into one that is nice and disjoint-branch (Definition 5). This involves adding additional “vertices” to the bags of the tree, which are not vertices in the original graph. Essentially, some vertices in the graph G will be represented by two or more “vertices” in the generated nice DBTD. The introduction of these new vertices is done in a way that maintains the consistency with respect to the vertex-assignments in the original (nice) TD. Importantly, our algorithm does not require the graph G to be a rooted-directed-path graph [19, 30], and does not change it. The transformation from a nice TD to a nice DBTD comes at the cost of at most doubling the width of the nice TD. The necessity of this step, the details of the transformation, its correctness, and runtime analysis are presented in Section 5.2; some of the technical details are deferred to the full version of this paper [39]. After the transformation from a nice TD to a nice DBTD, the preprocessing phase proceeds by dynamic programming over the generated nice DBTD.

For the sake of completeness, we show in the full version of this paper [39] how to transform a DBTD (Definition 2) to a nice DBTD in polynomial time, while preserving its width. While the nice DBTD structure is new, transforming a DBTD to a nice DBTD is similar to transforming a TD to a nice TD (Chapter 13 in [43]).

In Section 5.4, we describe how to represent the factor tables of Section 5.1 as tries. This allows us to test whether an assignment is extendable in time O(tw(G)), and allows us to obtain fixed-parameter-linear delay of O(ntw(G)) (see Proposition 4).

5.1 Factors in the TD

Let (𝒯,χ) be a TD of G, and uV(𝒯). Recall that Gu=defG[Vu] is the graph induced by Vu=deftV(𝒯u)χ(t). We define a labeling of the bag χ(u) to be an assignment φu:χ(u)F. There are |F||χ(u)| possible labelings of χ(u). In Section 4, we defined what it means for a minimal dominating set D𝒮(G) to be consistent with a labeling. Next, we define what it means for a set of vertices DuVu to be consistent with an assignment φu:χ(u)F.

Definition 26.

Let uV(𝒯), and φu:χ(u)F. A subset DuVu is consistent with φu if the following holds for every vχ(u):

  1. 1.

    vDu if and only if φu(v)Fσ=def{[0]σ,[1]σ,σI}.

  2. 2.

    If φu(v)=[j]ω where j{0,1}, then |N(v)Du|1, and |N(v)(Duχ(u))|=j (i.e., v has j neighbors in Duχ(u)).

  3. 3.

    If φu(v)=[j]ρ where j{0,1,2}, then v has at least j neighbors in Duχ(u). That is, |N(v)(Duχ(u))|j.

  4. 4.

    If φu(v)=[1]σ, then v has a neighbor wN(v)(Vu(Duχ(u)), such that N(w)Du={v}. In addition, none of v’s neighbors in χ(u) are assigned label [1]ω.

  5. 5.

    If φu(v)=[0]σ, then for every wN(v)(Vu(Duχ(u))), it holds that |N(w)Du|2.

  6. 6.

    If φu(v)=σI, then every neighbor of v in Vu is assigned a label in Fρ.

In addition, for every vVuχ(u) one of the following holds:

  1. 1.

    vDu and N(v)Du (i.e., v is dominated by Du).

  2. 2.

    vDu, and has a private neighbor in VuDu. That is, there exists a wN(v)(VuDu) such that N(w)Du={v}.

  3. 3.

    vDu, does not have a private neighbor in VuDu, and N(v)Du=.

We denote by u:Fχ(u){0,1} a mapping that assigns every labeling φu:χ(u)F a Boolean indicating whether there exists a subset DuVu that is consistent with φu according to Definition 5.1. That is, u(φu)=1 if and only if there exists a set DuVu that is consistent with φu:χ(u)F according to Definition 5.1. The factors of the TD are the mappings {u:uV(𝒯)}.

Example 27.

Consider node u marked (with a double circle) in Figure 3(b), and an assignment φu:χ(u)F. Note that N(a)(Vuχ(u))=N(a){d,e}=. According to Definition 5.1, every consistent subset DuVu can induce only one of {σI,[0]ω,[0]σ,[0]ρ} to the vertex a. Equivalently, for any φu:χ(u)F where φu(a){σI,[0]ω,[0]σ,[0]ρ}, it holds that u(φu)=0.

Consider the labeling where φu(a)=[0]σ. If φu(b)=[1]ω, then by Definition 5.1 and the fact that N(b)(Vuχ(u))={d}, in any DuVu that is consistent with φu, it must hold that dDu. But then, |N(b)Du|=2>1, violating the constraint that |N(b)Du|1. Hence, no such consistent set Du exists, and u(φu)=0. If φu(a)=[0]σ, φu(b)=[0]ω, and φu(c)=[1]ρ, then by Definition 5.1 and the fact that N(c)(Vuχ(u))={d}, in any DuVu that is consistent with φu, it must hold that dDu. Since dDuN(b), then Du cannot be consistent with the assignment φu(b)=[0]ω. Therefore, u(φu)=0.

Now, consider the assignment where φu(a)=[0]σ, and φu(b)=φu(c)=[0]ω. In this case u(φu)=1 because the set Du={a,e} is consistent with φu according to Definition 5.1. Indeed, N(b)(Duχ(u))=N(c)(Duχ(u))=, hence Du is consistent with label [0]ω on vertices b and c as required. Since N(a)(Vu(Duχ(u))=, then Du is consistent with φu(a)=[0]σ. We now consider vertices Vuχ(u)={d,e}. Observe that N(d)Du={e}, and thus d is dominated by Du. Also, e has a private neighbor in VuDu. Indeed, N(d)Du={e}. Figure 3(c) presents other assignments φu to the vertices of χ(u) where u(φu)=1, along with the appropriate consistent subsets of vertices (i.e., according to Definition 5.1).

5.2 From Nice to Disjoint-Nice TD

In this section, we show how the encoding of an instance of the Dom-Enum problem in a nice TD (𝒯,χ), of width w, can be represented using a nice disjoint branch TD (𝒯,χ), whose width is at most 2w, and how this representation is built in time O(nw2). We start with an example that provides some intuition regarding the requirement for a nice DBTD.

Example 28.

Let uV(𝒯) be a regular (i.e., not disjoint) join node in a nice TD (𝒯,χ), with children u0 and u1 (Definition 2). Hence, χ(u)=χ(u0)=χ(u1). For simplicity, suppose that χ(u)={v}. Consider the assignment φu:χ(u)F, where φu(v)=[1]ω. By definition, a subset DuVu that is consistent with φu(v) (Definition 5.1) is such that |DuN(v)|=1. This means that Du includes exactly one neighbor of v in Gu0 (and no neighbors of v from Gu1), or vice versa. Therefore, to infer whether there exists a subset DuVu that is consistent with the assignment φu={v[1]ω} (i.e., u(φu)=1), we need to assign distinct labels to v in the two sub-trees 𝒯u0 and 𝒯u1, corresponding to the induced subgraphs Gu0 and Gu1 respectively. Specifically, we need to assign φu0(v)=[1]ω and φu1(v)=[0]ω, or φu0(v)=[0]ω and φu1(v)=[1]ω. The transformation of Lemma 5.2 represents v as two distinct vertices v0,v1 in 𝒯u0 and 𝒯u1 respectively, while preserving the integrity of the assignment; that is, |N(v)Du|=1.

The details of the algorithm that generates a nice DBTD from a nice TD (𝒯,χ), proof of correctness and runtime analysis are deferred to the full version of this paper [39]. Here, we provide some intuition. The algorithm transforms every join node uV(𝒯) with children u0 and u1 to a disjoint join node (see Definition 5). To that end, the algorithm adds a new node u to 𝒯, such that χ(u)=defχ(u)vχ(u){v0,v1}, where v0 and v1 are variables used to represent the neighbors of v in G[Vu0] and G[Vu1] respectively. Every occurrence of v in 𝒯u0 is replaced with v0, and every occurrence of v in 𝒯u1 is replaced with v1. To maintain the niceness of the resulting TD, the algorithm adds 3|χ(u)|2 additional nodes (of type forget and introduce, see details and illustration in [39]). Since v0 and v1 essentially represent vertex v in G[Vu0] and G[Vu1] respectively, then certain constraints are placed on the assignments φu:χ(u)F associated with node u. For example, φu(v)Fa if and only if φu(vi)Fa for a{σI,σ,ρ,ω} and i{0,1}. Another such constraint, for example, is that φu(v)=[1]ω if and only if φu(v0)=[x0]ω, φu(v1)=[x1]ω, and x0+x1=1. The complete list of constraints are specified in [39]. We refer to this set of constraints as local constraints. Observe that local constraints can be verified in time O(|χ(u)|).

Let uV(𝒯), and let κu:Fχ(u){0,1} be the set of local constraints on Fχ(u) that can be verified in time O(|χ(u)|). We say that an assignment φuFχ(u) is consistent with κu if κu(φu)=1. For a node uV(𝒯), and local constraints κu:Fχ(u){0,1}, we let Ku=def{φu:χ(u)F:κu(φu)=1}. For a given set of local constraints {κu:uV(𝒯)}, and letting s=|F|, the effective width of (𝒯,χ) is defined:

efftw(𝒯,χ)=defmaxuV(𝒯)logs|Ku| (2)

Observe that in any dynamic programming algorithm over a TD, the runtime and memory consumption of the algorithm depends exponentially on the effective width of the TD.

Lemma 29.

If a graph G has a nice TD (𝒯,χ) of width w, then there is an algorithm that in time O(nw2) constructs a nice disjoint-branch TD (𝒯,χ) with the following properties:

  1. 1.

    V(𝒯)V(𝒯), where for every node uV(𝒯), there is a bijection between χ(u) and χ(u).

  2. 2.

    For every node uV(𝒯), and every φu:χ(u)F, it holds that u(φu)=1 if and only if u(φu)=1, where φu:χ(u)F and u are the corresponding labeling and factor of node u in (𝒯,χ).

  3. 3.

    The effective width of (𝒯,χ) is at most 2w.

Conditions (1) and (2) of Lemma 5.2 guarantee that for every node uV(𝒯), and φu:χ(u)F, a subset DuVu is consistent with φu (in (𝒯,χ)) according to Definition 5.1, if and only if Du is consistent with φu (in (𝒯,χ)) according to Definition 5.1 (following the appropriate name-change to the vertices). This guarantees that, following the preprocessing phase, the generated TD (𝒯,χ) can be used for enumerating the minimal dominating sets of G. The proof of Lemma 5.2 and the pseudocode detailing the construction of a nice disjoint branch TD is deferred to the full version of this paper [39].

In [39], we present the algorithm that receives as input a nice DBTD (𝒯,χ) and a set of local constraints {κu:uV(𝒯)}, which computes the factor tables u:φu{0,1} for every labeling φu:χ(u)F, and every node uV(𝒯), by dynamic programming. The factor tables account for the local constraints. That is, if u(φu)=1 then κu(φu)=1. We define s=defmaxvV(G)|Fv|, where FvF is the set of labels that may be assigned to v.

Lemma 30.

Let (𝒯,χ) be a nice DBTD whose width is w, and let {κu:uV(𝒯)} be a set of local constraints. There is an algorithm that in time O(nwsw) computes the factors of (𝒯,χ) such that for every uV(𝒯), and every labeling φu:χ(u)F, it holds that u(φu)=1 if and only if κu(φu)=1, and there exists a subset DuVu that is consistent with φu according to Definition 5.1.

5.3 Factor Sizes and Runtime for trans-enum (proof of Corollary 1)

Recall that for every vV(G), we denote by FvF the domain of v. That is, Fv is the set of labels that may be assigned to v in any labeling φu:χ(u)F where uV(𝒯) and vχ(u). In Lemma 5.2, we establish that the dynamic programming over the nice DBTD takes time O(nwsw) where w is the width of the DBTD, and s=maxvV(G)|Fv|. In Section 5.2, we show that if tw(G)=k, then G has an embedding to a disjoint-branch TD whose effective width is at most w=2k. This allows us to prove Theorem 1. Next, we show that for trans-enum, every vertex can be assigned only one of s=5 labels (i.e., as opposed to 8 labels), leading to better runtimes of the preprocessing phase, thus proving Corollary 1.

In Section 3, we reduced trans-enum on hypergraph to dom-enum over the tripartite graph B=defB(), where V(B)={v}V(){ye:e()}. By Theorem 9, we are interested in minimal dominating sets D𝒮(B), where D{v}V(), and DV(). In particular, D𝒮(B) where D{ye:e()}=. Therefore, vertices in {ye:e()} can only be assigned labels in FωFρ. If wV(), then by construction N(w){v}{ye:e()}. Consequently, vertices wV() can have at most one neighbor in the minimal dominating set; namely the vertex v. Therefore, vertices in V() can only be assigned labels in FσFω. Finally, the only dominating set of B that excludes v and {ye:e()}, is V(). Since we are interested only in minimal transversals that are strictly included in V(), then v must be dominating, and can only be assigned labels in Fσ. So, we get that vertices in {ye:e()} can only be assigned labels from FωFρ, vertices in V() can only be assigned labels in FσFω, and the vertex v can only be assigned labels in Fσ. Since |FσFω|=|FωFρ|=5, and |Fσ|=3, every vertex of B can be assigned one of at most 5 labels. Therefore, for trans-enum the runtime of the preprocessing phase is in O(nw5w). This proves Corollary 1.

5.4 Compact Representation with Tries

A trie is a tree-based data structure used for efficient retrieval of strings. It organizes the strings by storing characters as tree-nodes, with common prefixes shared among multiple strings. Each node represents a character, and its outgoing edges possible next characters. The root node represents an empty string, and leaf nodes represent complete strings.

In Lemma 5.2 we establish the correctness and runtime guarantees of the algorithm that receives as input a nice DBTD (𝒯,χ), and a set of local constraints {κu:uV(𝒯)}, which computes the factor tables {u:uV(𝒯)} where u:φu{0,1}. We represent u as a trie where every root-to-leaf path in u represents an assignment φu:χ(u)F, or a word in Fχ(u), such that u(φu)=1. Let χ(u)={v1,,vk} where the indices represent a complete order over χ(u). We view every assignment φu:χ(u)F as a string: φu(v1)φu(v2)φu(vk) over the alphabet F. In our construction, every string in the trie has precisely k characters from F, where the first character represents the assignment to v1, the second character the assignment to v2, etc. The advantage of this representation is that given an assignment φu:χ(u)F, we can check if u(φu)=1 in O(k) time by traversing the trie from root to leaf to check whether the string φu(v1)φu(vk) appears in the trie u. In the tabular representation, this requires traversing all assignments in the table, which, in the worst case, takes time O(|F|k). The order used to generate the tries is derived from the complete order Q defined in Section 4.1. See Figure 3(c) for an illustration.

6 Correctness and Analysis of The Enumeration Algorithm

We prove correctness of algorithm EnumDS presented in Section 4, and show that its delay is O(nw) where n=|V(G)| and w=tw(G). By Proposition 4, this requires showing that both IncrementLabeling and IsExtendable are correct and run in time O(w). The former is established in Lemmas 4.2 and 4.2. We present the IsExtendable procedure (Figure 2), prove its correctness, and show that it runs in time O(w). The input to IsExtendable is the pair (,θi) where =(𝒯,χ) is the nice DBJT (Definition 5) that is the output of the preprocessing phase described in Section 5, and θi:ViF is an assignment which meets the conditions of Proposition 4.2. In Lemmas 5.2 and 5.2 we established that =(𝒯,χ) has the property that every node uV(𝒯) is associated with a factor u:Fχ(u){0,1} where u(φu)=1 if and only if there exists a subset DuVu that is consistent with the assignment φu:χ(u)F according to Definition 5.1. In Section 5.4, we showed how to check whether u(φu)=1 in time O(|χ(u)|)=O(w).

Theorem 31.

IsExtendable returns true iff θi𝚯i is extendable, and runs in time O(tw(G)).

The proof of Theorem 31 relies on the fact that =(𝒯,χ) is a processed nice disjoint-branch TD. The proof is deferred to the full version of this paper [39].

7 Conclusion

We present an efficient algorithm for enumerating minimal hitting sets parameterized by the treewidth of the hypergraph. We introduced the nice disjoint-branch TD suited for this task and potentially useful for other problems involving the enumeration of minimal or maximal solutions with respect to some property, common in data management. As future work, we intend to publish an implementation of our algorithm and empirically evaluate it on datasets for data-profiling problems [1, 45]. We also plan to investigate ranked enumeration of minimal hitting sets parameterized by treewidth.

References

  • [1] Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. Data Profiling. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2018. doi:10.2200/S00878ED1V01Y201810DTM052.
  • [2] Antoine Amarilli, Pierre Bourhis, Louis Jachiet, and Stefan Mengel. A circuit-based approach to efficient enumeration. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 111:1–111:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ICALP.2017.111.
  • [3] Stefan Arnborg, Jens Lagergren, and Detlef Seese. Easy problems for tree-decomposable graphs. J. Algorithms, 12(2):308–340, 1991. doi:10.1016/0196-6774(91)90006-K.
  • [4] Guillaume Bagan. MSO queries on tree decomposable structures are computable with linear delay. In Zoltán Ésik, editor, Computer Science Logic, 20th International Workshop, CSL 2006, 15th Annual Conference of the EACSL, Szeged, Hungary, September 25-29, 2006, Proceedings, volume 4207 of Lecture Notes in Computer Science, pages 167–181. Springer, 2006. doi:10.1007/11874683_11.
  • [5] Julien Baste, Ignasi Sau, and Dimitrios M. Thilikos. A complexity dichotomy for hitting connected minors on bounded treewidth graphs: the chair and the banner draw the boundary. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 951–970. SIAM, 2020. doi:10.1137/1.9781611975994.57.
  • [6] Paul Beame, Guy Van den Broeck, Eric Gribkoff, and Dan Suciu. Symmetric weighted first-order model counting. In Tova Milo and Diego Calvanese, editors, Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 313–328. ACM, 2015. doi:10.1145/2745754.2745760.
  • [7] Leopoldo E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. doi:10.2200/S00379ED1V01Y201108DTM020.
  • [8] Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow., 13(11):2270–2283, 2020. URL: http://www.vldb.org/pvldb/vol13/p2270-birnick.pdf.
  • [9] Thomas Bläsius, Tobias Friedrich, Julius Lischeid, Kitty Meeks, and Martin Schirneck. Efficiently enumerating hitting sets of hypergraphs arising in data profiling. J. Comput. Syst. Sci., 124:192–213, 2022. doi:10.1016/j.jcss.2021.10.002.
  • [10] Endre Boros, Khaled M. Elbassioni, Vladimir Gurvich, and Leonid Khachiyan. An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension. Parallel Process. Lett., 10(4):253–266, 2000. doi:10.1142/S0129626400000251.
  • [11] Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90–121, 2005. doi:10.1016/j.ic.2004.04.007.
  • [12] Jan Chomicki, Jerzy Marcinkowski, and Slawomir Staworko. Computing consistent query answers using conflict hypergraphs. In David A. Grossman, Luis Gravano, ChengXiang Zhai, Otthein Herzog, and David A. Evans, editors, Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, November 8-13, 2004, pages 417–426. ACM, 2004. doi:10.1145/1031171.1031254.
  • [13] Bruno Courcelle. Graph rewriting: An algebraic and logic approach. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics, pages 193–242. Elsevier and MIT Press, 1990. doi:10.1016/b978-0-444-88074-1.50010-x.
  • [14] Bruno Courcelle. Linear delay enumeration and monadic second-order logic. Discret. Appl. Math., 157(12):2675–2700, 2009. doi:10.1016/j.dam.2008.08.021.
  • [15] Bruno Courcelle. On the model-checking of monadic second-order formulas with edge set quantifications. Discret. Appl. Math., 160(6):866–887, 2012. doi:10.1016/J.DAM.2010.12.017.
  • [16] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Daniel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer Publishing Company, Incorporated, 1st edition, 2015.
  • [17] John Doner. Tree acceptors and some of their applications. J. Comput. Syst. Sci., 4(5):406–451, 1970. doi:10.1016/S0022-0000(70)80041-1.
  • [18] Louis Dublois, Michael Lampis, and Vangelis Th. Paschos. Upper dominating set: Tight algorithms for pathwidth and sub-exponential approximation. Theor. Comput. Sci., 923:271–291, 2022. doi:10.1016/J.TCS.2022.05.013.
  • [19] David Duris. Some characterizations of γ and β-acyclicity of hypergraphs. Inf. Process. Lett., 112(16):617–620, 2012. doi:10.1016/j.ipl.2012.05.005.
  • [20] Thomas Eiter and Georg Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput., 24(6):1278–1304, 1995. doi:10.1137/S0097539793250299.
  • [21] Thomas Eiter and Georg Gottlob. Hypergraph transversal computation and related problems in logic and AI. In Sergio Flesca, Sergio Greco, Nicola Leone, and Giovambattista Ianni, editors, Logics in Artificial Intelligence, European Conference, JELIA 2002, Cosenza, Italy, September, 23-26, Proceedings, volume 2424 of Lecture Notes in Computer Science, pages 549–564. Springer, 2002. doi:10.1007/3-540-45757-7_53.
  • [22] Thomas Eiter, Georg Gottlob, and Kazuhisa Makino. New results on monotone dualization and generating hypergraph transversals. SIAM J. Comput., 32(2):514–537, 2003. doi:10.1137/S009753970240639X.
  • [23] Thomas Eiter, Kazuhisa Makino, and Georg Gottlob. Computational aspects of monotone dualization: A brief survey. Discret. Appl. Math., 156(11):2035–2049, 2008. doi:10.1016/j.dam.2007.04.017.
  • [24] Ronald Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3):514–550, 1983. doi:10.1145/2402.322390.
  • [25] Jörg Flum, Markus Frick, and Martin Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716–752, 2002. doi:10.1145/602220.602222.
  • [26] Jörg Flum and Martin Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. doi:10.1007/3-540-29953-X.
  • [27] Michael L. Fredman and Leonid Khachiyan. On the complexity of dualization of monotone disjunctive normal forms. J. Algorithms, 21(3):618–628, 1996. doi:10.1006/jagm.1996.0062.
  • [28] Markus Frick and Martin Grohe. The complexity of first-order and monadic second-order logic revisited. Annals of Pure and Applied Logic, 130(1):3–31, 2004. Papers presented at the 2002 IEEE Symposium on Logic in Computer Science (LICS). doi:10.1016/j.apal.2004.01.007.
  • [29] Andrew Gainer-Dewar and Paola Vera-Licona. The minimal hitting set generation problem: Algorithms and computation. SIAM J. Discret. Math., 31(1):63–100, 2017. doi:10.1137/15M1055024.
  • [30] Fanica Gavril. A recognition algorithm for the intersection graphs of directed paths in directed trees. Discret. Math., 13(3):237–249, 1975. doi:10.1016/0012-365X(75)90021-7.
  • [31] Georg Gottlob, Reinhard Pichler, and Fang Wei. Bounded treewidth as a key to tractability of knowledge representation and reasoning. Artif. Intell., 174(1):105–132, 2010. doi:10.1016/J.ARTINT.2009.10.003.
  • [32] Georg Gottlob, Reinhard Pichler, and Fang Wei. Monadic datalog over finite structures of bounded treewidth. ACM Trans. Comput. Log., 12(1):3:1–3:48, 2010. doi:10.1145/1838552.1838555.
  • [33] Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharm. Discovering all most specific sentences. ACM Trans. Database Syst., 28(2):140–174, 2003. doi:10.1145/777943.777945.
  • [34] Mamadou Moustapha Kanté, Vincent Limouzy, Arnaud Mary, and Lhouari Nourine. On the enumeration of minimal dominating sets and related notions. SIAM J. Discret. Math., 28(4):1916–1929, 2014. doi:10.1137/120862612.
  • [35] Ahmet Kara and Dan Olteanu. Covers of query results. In Benny Kimelfeld and Yael Amsterdamer, editors, 21st International Conference on Database Theory, ICDT 2018, March 26-29, 2018, Vienna, Austria, volume 98 of LIPIcs, pages 16:1–16:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.ICDT.2018.16.
  • [36] Batya Kenig and Avigdor Gal. On the impact of junction-tree topology on weighted model counting. In Christoph Beierle and Alex Dekhtyar, editors, Scalable Uncertainty Management - 9th International Conference, SUM 2015, Québec City, QC, Canada, September 16-18, 2015. Proceedings, volume 9310 of Lecture Notes in Computer Science, pages 83–98. Springer, 2015. doi:10.1007/978-3-319-23540-0_6.
  • [37] Batya Kenig, Avigdor Gal, and Ofer Strichman. A new class of lineage expressions over probabilistic databases computable in p-time. In Weiru Liu, V. S. Subrahmanian, and Jef Wijsen, editors, Scalable Uncertainty Management - 7th International Conference, SUM 2013, Washington, DC, USA, September 16-18, 2013. Proceedings, volume 8078 of Lecture Notes in Computer Science, pages 219–232. Springer, 2013. doi:10.1007/978-3-642-40381-1_17.
  • [38] Batya Kenig, Benny Kimelfeld, Haoyue Ping, and Julia Stoyanovich. Querying probabilistic preferences in databases. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 21–36. ACM, 2017. doi:10.1145/3034786.3056111.
  • [39] Batya Kenig and Dan Shlomo Mizrahi. Enumeration of minimal hitting sets parameterized by treewidth. CoRR, abs/2408.15776, 2024. doi:10.48550/arXiv.2408.15776.
  • [40] Batya Kenig, Pranay Mundra, Guna Prasaad, Babak Salimi, and Dan Suciu. Mining approximate acyclic schemes from relations. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 297–312. ACM, 2020. doi:10.1145/3318464.3380573.
  • [41] Batya Kenig and Nir Weinberger. Quantifying the loss of acyclic join dependencies. In Floris Geerts, Hung Q. Ngo, and Stavros Sintos, editors, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023, pages 329–338. ACM, 2023. doi:10.1145/3584372.3588658.
  • [42] Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Counting and enumerating preferred database repairs. Theor. Comput. Sci., 837:115–157, 2020. doi:10.1016/j.tcs.2020.05.016.
  • [43] Ton Kloks. Treewidth, Computations and Approximations, volume 842 of Lecture Notes in Computer Science. Springer, 1994. doi:10.1007/BFb0045375.
  • [44] Joachim Kneis and Alexander Langer. A practical approach to courcelle’s theorem. In Milan Ceska, Zdenek Kotásek, Mojmír Kretínský, Ludek Matyska, and Tomás Vojnar, editors, Proceedings of the International Doctoral Workshop on Mathematical and Engineering Methods in Computer Science, MEMICS 2008, Znojmo, Czech Republic, November 14-16, 2008, volume 251 of Electronic Notes in Theoretical Computer Science, pages 65–81. Elsevier, 2008. doi:10.1016/J.ENTCS.2009.08.028.
  • [45] Jan Kossmann, Thorsten Papenbrock, and Felix Naumann. Data dependencies for query optimization: a survey. VLDB J., 31(1):1–22, 2022. doi:10.1007/S00778-021-00676-3.
  • [46] Sebastian Kruse and Felix Naumann. Efficient discovery of approximate dependencies. Proc. VLDB Endow., 11(7):759–772, 2018. doi:10.14778/3192965.3192968.
  • [47] Alexander Langer, Felix Reidl, Peter Rossmanith, and Somnath Sikdar. Practical algorithms for MSO model-checking on tree-decomposable graphs. Comput. Sci. Rev., 13-14:39–74, 2014. doi:10.1016/j.cosrev.2014.08.001.
  • [48] Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. Approximate denial constraints. Proc. VLDB Endow., 13(10):1682–1695, 2020. doi:10.14778/3401960.3401966.
  • [49] Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Slightly superexponential parameterized problems. SIAM J. Comput., 47(3):675–702, 2018. doi:10.1137/16M1104834.
  • [50] Heikki Mannila and Kari-Jouko Räihä. Dependency inference. In Peter M. Stocker, William Kent, and Peter Hammersley, editors, VLDB’87, Proceedings of 13th International Conference on Very Large Data Bases, September 1-4, 1987, Brighton, England, pages 155–158. Morgan Kaufmann, 1987. URL: http://www.vldb.org/conf/1987/P155.PDF.
  • [51] Keisuke Murakami and Takeaki Uno. Efficient algorithms for dualizing large-scale hypergraphs. Discret. Appl. Math., 170:83–94, 2014. doi:10.1016/J.DAM.2014.01.012.
  • [52] Ronald C. Read and Robert E. Tarjan. Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks, 5(3):237–252, 1975. doi:10.1002/net.1975.5.3.237.
  • [53] Klaus Reinhardt. The complexity of translating logic to finite automata. In Erich Grädel, Wolfgang Thomas, and Thomas Wilke, editors, Automata, Logics, and Infinite Games: A Guide to Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of Lecture Notes in Computer Science, pages 231–238. Springer, 2001. doi:10.1007/3-540-36387-4_13.
  • [54] Günter Rote. Minimal dominating sets in a tree: Counting, enumeration, and extremal results. CoRR, abs/1903.04517, 2019. arXiv:1903.04517.
  • [55] James W. Thatcher and Jesse B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Syst. Theory, 2(1):57–81, 1968. doi:10.1007/BF01691346.
  • [56] Alexei Vazquez. Optimal drug combinations and minimal hitting sets. BMC Syst. Biol., 3:81, 2009. doi:10.1186/1752-0509-3-81.
  • [57] Paola Vera-Licona, Eric Bonnet, Emmanuel Barillot, and Andrei Zinovyev. OCSANA: optimal combinations of interventions from network analysis. Bioinformatics, 29(12):1571–1573, April 2013. _eprint: https://academic.oup.com/bioinformatics/article-pdf/29/12/1571/48886062/bioinformatics_29_12_1571.pdf. doi:10.1093/bioinformatics/btt195.
  • [58] M. Weyer. Modifizierte parametrische komplexitätstheorie. PhD Thesis, Universität Freiburg, 2008.
  • [59] R. Xiao, Y. Yuan, Z. Tan, S. Ma, and W. Wang. Dynamic functional dependency discovery with dynamic hitting set enumeration. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 286–298, Los Alamitos, CA, USA, May 2022. IEEE Computer Society. doi:10.1109/ICDE53745.2022.00026.