Designing Output Sensitive Algorithms for Subgraph Enumeration
Abstract
The enumeration of all subgraphs respecting some structural property is a fundamental task in theoretical computer science, with practical applications in many branches of data mining and network analysis. It is often of interest to only consider solutions (subgraphs) that are maximal under inclusion, and to achieve output-sensitive complexity, i.e., bounding the running time with respect to the number of subgraphs produced. In this paper, we provide a survey of techniques for designing output-sensitive algorithms for subgraph enumeration, including partition-based approaches such as flashlight search, solution-graph traversal methods such as reverse search, and cost amortization strategies such as push-out amortization. We also briefly discuss classes of efficiency, hardness of enumeration, and variants such as approximate enumeration. The paper is meant as an accessible handbook for learning the basics of the field and as a practical reference for selecting state-of-the-art subgraph enumeration strategies fitting to one’s needs.
Keywords and phrases:
Graph algorithms, Graph enumeration, Output sensitive enumerationCategory:
ResearchFunding:
Alessio Conte: Partially supported by MUR PRIN 2022 project EXPAND: scalable algorithms for EXPloratory Analyses of heterogeneous and dynamic Networked Data (#2022TS4Y3N).Copyright and License:
2012 ACM Subject Classification:
Mathematics of computing Graph enumeration ; Mathematics of computing Graph algorithmsEditors:
Alessio Conte, Andrea Marino, Giovanna Rosone, and Jeffrey Scott VitterSeries and Publisher:
Open Access Series in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The goal of enumeration is to systematically list all feasible solutions to a given problem. Unlike optimization problems, which seek to find a single best solution according to an objective function – i.e., an extreme case – enumeration problems aim to identify all solutions that satisfy given constraints, representing local extreme cases.
A well-constructed enumeration model strikes a balance between the size and the number of solutions. When solutions are large, it is preferable to have fewer solutions. To achieve this, models often incorporate parameters such as solution size, frequency, and weight, or they unify similar solutions to manage complexity. Also, in order to limit the number of solutions, we are often interested in listing only minimal or maximal solutions, i.e., solutions that cannot be reduced or enlarged. It is important to note that the number of solutions grows with the size of the input. When dealing with small input sizes, brute-force algorithms and simple implementations can effectively solve the problem. However, for large-scale data, more sophisticated algorithmic techniques are required to ensure computation time increases in a controlled manner as the input size grows.
The development of algorithms for enumerating all possible solutions to a specific combinatorial problem has a long history, dating back at least to the 1960s. During this period, researchers began tackling the problem of enumerating specific graph-theoretic structures, such as shortest paths and cycles. As noted by David Eppstein [27], enumeration problems have numerous applications, including:
-
1.
Identifying structures that satisfy additional constraints that are difficult to optimize,
-
2.
Evaluating the quality of a model for a given problem by assessing the number of incorrect structures,
-
3.
Analyzing the sensitivity of structures to variations in problem parameters,
-
4.
Exploring a broader class of structures beyond just the optimal ones to gain deeper insight into the problem.
Over the past fifty years, a wide range of enumeration problems have been studied in the literature, spanning various domains such as geometry, graph and hypergraph theory, order and permutation problems, logic, set theory, and string problems. A recent compendium compiled by part of the authors of this paper includes more than 500 combinatorial problems along with more than 300 references, highlighting the depth and breadth of research in this field.
1.1 Our Contribution
Despite significant progress, the field of enumeration algorithms remains highly active, with many intriguing open problems still under investigation. This paper seeks to contribute to this ongoing research by first providing an overview of the key computational challenges in designing and analyzing enumeration algorithms, and then presenting some of the most effective techniques for developing efficient enumeration methods.
In particular, this paper serves as an introductory survey for newcomers interested in the design of efficient output-sensitive enumeration algorithms. At the same time, it may also serve as a valuable reference for experienced researchers, offering insights into state-of-the-art techniques and pointers to relevant literature on well-established concepts in the field.
1.2 Structure of the paper and roadmap
In Section 2, we discuss the general algorithmic challenges of enumeration and briefly examine brute-force approaches. While these methods do not require sophisticated algorithmic design, they provide the basis for understanding enumeration tasks, independently of efficiency considerations. On this latter note, Section 3 presents the various ways efficiency is defined in the context of listing algorithms.
Section 4 focuses on partition-based approaches for enumeration. We fist introduce a backtracking technique based on recursively partitioning the set of solutions. Depending on how duplication is managed, this approach can either be straightforward, or leverage forbidden sets. To improve efficiency, we present enhancements using the so-called flashlight method and the extension problem. As a byproduct of these techniques, we present their implications for assessment. Section 5 introduces strategies that intuitively navigate the space of solutions, often modeled as a solution graph. These include reverse search, the input-restricted problem approach, and proximity search.
In Section 6, we introduce amortization techniques, which are frequently used to refine partition-based methods and achieve (sub)linear average delay. We cover several such techniques, including amortization by children and grandchildren, push-out amortization, and geometric amortization.
Finally, in Section 7, we discuss approaches for determining the inherent difficulty of an enumeration problem. These arguments rely on the complexity of enumerating hypergraph transversals or employ suitable reductions from -hard problems. We also highlight here parameterized and approximate enumeration as promising directions for future research.
1.3 Preliminaries
This paper focuses on enumeration problems on graphs, so let us first give some preliminaries on the topic. A graph is a pair , where is its set of nodes or vertices, and is the set of edges. When the graph is clear from the context, we just write and . For an edge , we refer to nodes and as its endpoints. A self-loop is an edge with equal endpoints. When multiple edges are allowed with the same endpoints, we call the graph a multigraph, and we call multiplicity of an edge the number of times it occurs.
We say that is a subgraph of , and we write , if both and . A subgraph of is said to be induced by the set of nodes , and denoted by if is formed by all edges of that connect nodes of : . Given a node , we denote with the subgraph .
There are two main types of graphs: undirected or directed. A graph is undirected if its edges are represented as non-ordered pairs. In other words, two edges are equal if they have the same endpoints, i.e., . In this case, the degree of a node is the number of edges incident to , , and its neighbors are the set of nodes connected to it with an edge: . A graph is instead said to be directed if the edges of are ordered pairs, that is, edge is different from edge . In this case, we say that edge is directed from node to node . Given a node , we define its out-neighbors as , and symmetrically its in-neighbors as . We have now two forms of degree for a node : the outdegree is the number of out-neighbors, while the indegree is the number of in-neighbors. We note that, in the case of multigraphs (both directed and undirected), the degrees also account for the multiplicity of the incident edges.
A clique of an undirected graph is a subgraph such that every pair of distinct nodes of is connected by an edge. A matching of a graph is a set of (non-loop) edges that share no endpoints.
Paths and Cycles
A sequence of distinct adjacent nodes of is called a path: is a path if for all , where for any , and for all . Note that this definition is independent of whether is directed or not. In this case we say that traverses nodes , has endpoints and , and has length . A path with endpoints and is sometimes called an -path. Given two nodes , the set of all -paths of is denoted by , where only contains the trivial path from to itself. A cycle is a path whose endpoints coincide: in the previous notation, we have . A graph with no cycles is called acyclic; if it also is directed, it is referred to as a directed acyclic graph, or DAG.
Connectivity
We give here some connectivity notions which slightly differ according to whether the graph is directed or not. An undirected graph is called connected if there is a path connecting every pair of nodes. A graph is biconnected if for any the graph is still connected. On the contrary, an articulation point is a node such that is not connected. The maximal biconnected subgraphs of are called its biconnected components (BCCs). Analogously, a graph is 2-edge-connected if is connected for each , and edges whose removal disconnect the graph are called bridges. For directed graphs, we have two main notions of connectivity. A directed graph is weakly connected if the underlying undirected graph, that is, the graph obtained by replacing all directed edges with undirected ones, is connected. Graph is strongly connected if for each pair there are both a -path and a -path.
Trees
A tree is a connected acyclic undirected graph. A spanning tree of a graph is a subgraph of such that (i) is a tree and . Except for a single vertex, called root and denoted as , every node of the tree has a unique parent, defined as the only neighbor of on the -path. The neighbors of a node that are not its parent are called its children. A node with no children is called a leaf. For a node , we define its rooted subtree as the subgraph formed by node , and all nodes (and edges) that it can reach without traversing its parent.
2 Algorithmic Challenges and Brute Force Approaches
The design of enumeration algorithms involves several crucial aspects that must be considered to ensure both correctness and efficiency. Specifically, an enumeration algorithm must guarantee that each solution is output exactly once, thereby avoiding duplication. A straightforward approach to achieve this is to store all previously found solutions in memory and, upon encountering a new solution, check whether it has already been output. However, this approach can be highly memory-inefficient when solutions are relatively large wrt the available memory. Addressing this issue would require dynamic memory allocation mechanisms and efficient search structures, such as hash functions. Consequently, a more practical strategy employed by many enumeration algorithms is to determine whether a solution has already been output without explicitly storing all generated solutions.
In addition to direct duplication, implicit forms of redundancy should also be avoided – for instance, ensuring that isomorphic solutions are not redundantly output. To achieve this, it is often beneficial to define a canonical representation for solutions, enabling efficient comparisons. An ideal canonical form establishes a one-to-one mapping between objects and their representations without significantly increasing their size. This transformation allows the enumeration of certain objects to be reframed as the enumeration of their canonical forms. However, for structures such as graphs, sequence data, and matrices, determining isomorphism remains computationally challenging, even when using canonical forms. Nonetheless, in such cases, isomorphism can still be checked using exponential-time algorithms, which, in practice, often perform efficiently when the number of solutions is relatively small.
Simple structures, such as cliques and paths, are generally easy to enumerate. Cliques can be constructed by iteratively adding vertices, while the set of paths can be easily partitioned. However, more complex structures – such as maximal structures (where no additional element can be added without violating a required property), minimal structures (where no element can be removed without violating a required property), or constrained structures – are significantly more challenging to enumerate. In these cases, even if finding a single solution is feasible in polynomial time, the main challenge lies in devising a method to generate additional solutions from a given one, i.e., defining a solution neighborhood, to enable traversal through all solutions by iteratively exploring these neighborhoods.
Using an exponential-time approach to identify each neighboring solution or having an exponential number of neighbors can lead to inefficiency. When a solution requires exponentially many modifications to generate potential neighbors, the enumeration process may take exponential time per solution, as there is no guarantee that any given modification will yield a valid solution. This issue frequently arises in maximal solution enumeration: iteratively removing and adding elements to obtain maximality can allow traversal between solutions, but when the number of such modifications is exponential, the computational cost per solution also becomes exponential. In such cases, reducing the number of neighbors or employing pruning strategies to eliminate redundant computations can significantly improve efficiency.
Even more complex scenarios involve problems where finding a single solution is -complete, such as or the Hamiltonian cycle problem. Despite this, heuristics often prove effective, particularly when the problem is usually easy (as in ), the solutions are not excessively large (as in maximal and minimal structure enumeration), and the size of the solution space is bounded.
For small instance sizes, brute-force algorithms can be a viable approach. These methods include using a divide-and-conquer strategy to generate candidate solutions and selecting feasible ones, or iteratively constructing solutions while removing isomorphic duplicates. Two fundamental brute-force algorithmic schemas are outlined in Algorithms 1 and 2. In Algorithm 1, each solution is represented as an ordered sequence of values. By invoking BruteForce(1,), feasible values are recursively determined by extending the current solution, requiring only a check to determine whether constitutes a valid solution. Conversely, Algorithm 2 also attempts to extend the current solution but incorporates a check at each step to determine whether the solution has been previously encountered, storing results in a database .
For both algorithms, it is crucial to establish a method for transforming a candidate solution into another candidate . Additionally, in both cases, performing an accurate preliminary check to determine whether belongs to any feasible solution set can prevent unnecessary computations, thereby enhancing efficiency.
3 Classes of Efficiency
For many enumeration problems, the number of solutions is often exponential in the size of the input instance, which inherently leads to enumeration algorithms requiring at least exponential time. In other words, for enumeration algorithms, the worst case overall time, which bounds the total time to end with a function depending on the input size, is often exponential.111These types of bounds can be found by applying measure & conquer approaches [31], but a discussion of these is outside the scope of this paper.
However, when the number of solutions is polynomially bounded, it is natural to seek a polynomial-time algorithm. Hence, the most natural approach to enhance the analysis of enumeration algorithms is to assess how the total computation time for generating all solutions correlates with both the input size and the output size. Algorithms analyzed in this manner are commonly referred to as output-sensitive, in contrast to input-sensitive algorithms [32]. In this context, the complexity classes of enumeration problems defined in this section are based on the number of solutions, ensuring that if the solution set is small, an efficient algorithm terminates in polynomial time, while larger solution spaces permit longer runtimes [37]. For more formal definitions, we refer the reader to the recent survey in [60]. We adopt the same notation as [60], noting that the same notions have been referred to by the literature with different names.
We are interested only in problems whose number of solutions is finite and which are polynomially balanced, i.e., the size of each solution is polynomial in the input size [60].
Definition 1.
is the class of problems whose solutions can be checked in polynomial time.
The problems in can be seen as the task of listing the solutions (or witnesses) of problems [60]. This is referred to as in [56]. In this paper, we will only consider problems in .
Definition 2 (Output-polynomial time).
An enumeration algorithm operates in output-polynomial time if the time required to output all solutions is bounded by a polynomial in the size of the input and the number of solutions.
This notion has been referred to as polynomial total time by Johnson, Yannakakis, and Papadimitriou [37], and implicitly used before, for instance by Tarjan [63], and by Paull and Unger [53]. As pointed out by Goldberg [34], we can ask for output-polynomial time algorithms only if the decision problem is not -complete. More precisely, the listing version of an -complete problem cannot admit an output-polynomial time unless . Indeed, suppose that we have an output-polynomial time algorithm for the listing version running within some polynomial time-bound. We can use this listing algorithm to answer whether there is at least a solution, by waiting the time bound and checking whether there has been output at least a solution or not.
When generating all solutions is impractical due to excessive runtime, it becomes important to generate at least a subset of them efficiently. Therefore, we should evaluate and limit the total time required to produce a specified number of solutions.
Definition 3 (Incremental polynomial time).
An algorithm is incremental polynomial time if it generates the distinct solutions in time polynomial in and the input size.
Definition 4 (Average polynomial time).
A listing algorithm takes average polynomial time if the time required to output all the solutions is bounded by , where is the input size, is a constant, is the number of solutions.
In other words, if the average cost per solution is polynomial. This notion has been previously referred by Valiant [71] using the name P-enumerable. It is worth remarking that the average time per solution has been the subject of a lot of research, which focused on constant amortized time. An algorithm has a constant amortized time (CAT) if its total time divided by the number of solutions is constant (see the book [57]).
The following measure concerns the delay between consecutive solutions, i.e., bounding not the average time between two consecutive solutions but the worst-case time bound between them.
Definition 5 (Polynomial Delay).
An enumeration algorithm operates with polynomial delay if it generates solutions sequentially, ensuring that the time before outputting the first solution, the time between any two consecutive solutions, and the time between the last solution and the end of the algorithm is bounded by a polynomial in the input size.
Also, it is worth remarking that a lot of research on enumeration algorithms has focused on constant delay. Concerning this we mention the gray coding technique exposed in “The Art of Computer Programming”, Volume 4 Fascicle 2A, by Knuth [43]. In this context, in order to go from one solution to the next one in constant time, two consecutive solutions must only differ by a fixed number of elements.
It is worth noting that having a polynomial delay implies having an average polynomial time. Having an average polynomial time implies having incremental polynomial time. Having incremental polynomial time implies being output-polynomial. The converse of each of these implications is not true.
4 Partition Techniques for Enumeration
In this section, we detail the first main enumeration technique. The partition technique is a powerful approach for enumeration problems that systematically divides the solution space into two or more disjoint subsets, ensuring complete coverage of all the solutions while avoiding redundancy. In general, it is based on recursive decomposition, where the solution space is divided into smaller, non-overlapping subsets, allowing each subset to be processed independently. Moreover, each partition is distinct, ensuring that no solution is counted multiple times.
This can be realized by using a backtracking strategy, where partial solutions are incrementally constructed and extended until a complete solution is found or determined to be infeasible. A typical backtracking enumeration algorithm follows these steps:
-
1.
Start with an empty or partial solution .
-
2.
Extend the solution by making a choice from available candidates for some .
-
3.
If the extended solution satisfies some constraints, recursively explore further.
-
4.
If a valid complete solution is found, output it.
-
5.
If constraints are violated or no further extensions are possible, backtrack by undoing the last choice and exploring other possibilities.
From a partition perspective, in Item 2 the set of all the solutions extending is partitioned into the subsets of solutions extending . The constraints in Item 3 help guarantee that such subsets do not overlap, i.e., the corresponding subproblems do not contain the same solutions.
For the sake of explanation, we start by describing plain backtracking where the indexing of the elements is used to get non-overlapping subproblems. In this case, we enlarge solutions only guided by the candidates’ indexes. We then introduce backtracking with forbidden sets, where while recurring, still trying to avoid duplication, we more carefully reduce the sets of candidates, explicitly specifying to the recursive calls which candidates should not be considered. We then introduce the flashlight method, which introduces suitable checks before recurring, solving the so-called extension problem, in order to improve efficiency.
4.1 Plain Backtracking
A set (of subsets of ) satisfies downward closure if for any and for any , we have ; in other words, for any belonging to we have that any subset of also belongs to . Given a set and an oracle to decide whether belongs to , an unknown set of satisfying the downward closure, we consider the problem of enumerating all (eventually maximal) elements of . The backtracking technique is often applied to problems of this type.
In this approach we start from an empty set, and the elements are recursively added to a solution. The elements are usually indexed, so that in each iteration, in order to avoid duplication, only an element whose index is greater than the one of the current maximum element is added. After performing all examinations concerning one element, by backtracking, all the other possibilities are explored. The basic schema of backtracking algorithms is shown in Algorithm 3. Note that, whenever it is possible to apply this schema, we obtain a polynomial delay algorithm whose space complexity is also polynomial. The technique proposed relies on a depth-first search approach. However, it is worth observing that in some cases of enumeration of families of subsets exhibiting the downward closure property, arising in the mining of frequent patterns (e.g., mining of frequent itemsets), a breadth-first approach can also be successfully used instead of depth-first backtracking. For instance, this is the case of the Apriori algorithm for discovering frequent itemsets [62].
4.1.1 Enumeration of Subsets of Bounded Sum: Backtracking
Consider the problem of enumerating all the subsets of a collection whose sum is less than a certain threshold . By using the backtracking schema, it is possible to solve the problem as shown in Algorithm 4. Each iteration outputs a solution, and takes time, so that we have time per solution. It is worth observing that if we sort the elements of , then each recursive call can generate a solution in time, yielding time per solution.
4.2 Backtracking with Forbidden Set
In the previous section, namely Section 4.1, the elements’ indexing was helpful to get non-overlapping subproblems, by enlarging solutions only guided by the candidates’ indexes. In the case of backtracking with forbidden sets, while recurring we try to avoid duplication by more carefully reducing the sets of candidates. We do so by explicitly specifying to the recursive calls which candidates should not be considered, through a forbidden set. We showcase this approach through an example, the Bron-Kerbosch algorithm for maximal clique enumeration.
4.2.1 Bron-Kerbosch Algorithm for Maximal Clique Enumeration
The Bron-Kerbosch algorithm is a recursive backtracking algorithm used to enumerate all maximal cliques in an undirected graph. Given a graph , the algorithm maintains three sets:
-
: The current clique being constructed.
-
: The set of potential candidates that can be added to , meaning that all the vertices in are connected to all the vertices in .
-
: The set of already processed vertices to avoid redundant solutions.
A recursive call aims to produce all the maximal cliques containing , some vertices in , and no vertices in . Hence, we partition the set of maximal cliques satisfying this property by analysing all the possible candidates in . When considering a candidate , is added to the partial solution , while and are reduced to include only neighbors of .
We output solutions only if and are empty. If indeed contains some element, it means we can still do further recursion to enlarge , as is not maximal. If is non-empty, it also means that is not maximal, as to be maximal, it should include elements of . Whenever and , no further addition is possible to , but the clique is still not maximal, and the algorithm thus backtracks without performing any output. This is typically called a dead-end, and the presence of such events is the main reason why this algorithm is not output-sensitive.
The algorithm recursively explores the search space, ensuring that each maximal clique is output exactly once. The basic Bron-Kerbosch algorithm (Algorithm 5) recursively builds cliques and removes processed vertices to avoid duplicates. A more efficient version of the algorithm introduces a pivoting strategy [66]. Choosing a pivot vertex from helps reduce the number of recursive calls by limiting the number of candidates that need to be processed. To obtain this algorithm it is sufficient to replace line 5 with the following two lines: “Choose a pivot vertex from ” and “for each do”. The idea of this pivoting strategy is to avoid iterating at each expansion on the set. The results will have to contain either the pivot or one of its non-neighbors, since if none of the non-neighbors of the pivot is included, then we can add the pivot itself to the result. Hence we can avoid iterating on the neighbors of the pivot at this step.
An alternative approach to enhancing the basic Bron–Kerbosch algorithm involves omitting pivoting at the outermost recursion level and instead carefully selecting the order of recursive calls to minimize the size of the candidate vertex set within each call [28]. The degeneracy of a graph is defined as the smallest integer such that every subgraph of contains at least one vertex with a degree of at most . Every graph admits a degeneracy ordering, where each vertex has at most later neighbors in the sequence. This ordering can be computed in linear time by repeatedly removing the vertex with the smallest degree among the remaining ones. By processing the vertices in the Bron–Kerbosch algorithm according to a degeneracy ordering, the candidate set (i.e., the neighbors of the current vertex that appear later in the ordering) is guaranteed to have at most elements. Conversely, the exclusion set , which consists of previously processed neighbors, may be significantly larger. Although pivoting is omitted at the outermost recursion level, it can still be employed in deeper recursive calls to further optimize performance.
Time Bounds
The Bron–Kerbosch algorithm is not output-sensitive – unlike some other algorithms for the same task (as shown later), it does not run in polynomial time per maximal clique generated. However, it remains efficient in the worst-case sense. By a result by Moon and Moser (1965), any -vertex graph has at most maximal cliques [52]. When using a pivot strategy that minimizes recursive calls, the Bron–Kerbosch algorithm runs in , matching this bound [66]. For sparse graphs, tighter bounds apply. Specifically, the degeneracy-ordering variant of the algorithm runs in . Some -degenerate graphs contain up to maximal cliques, making this bound nearly tight [28].
4.3 Flashlight Search
In the following, for the sake of simplicity, we use the binary partition approach, which is a backtracking strategy where each recursive node has at most two children as explained next, in order to easily introduce a more refined backtracking strategy, called flashlight. The idea can be easily extended to the case where recursive nodes have more than two children.
Let be a subset of , the set of solutions, such that all elements of satisfy a property . The binary partition method outputs only if the set is a singleton, otherwise, it partitions into two sets and , whose solutions are characterized by the disjoint properties and respectively. This procedure is repeated until the current set of solutions is a singleton. The bipartition schema can be successfully applied to the problem of enumeration of paths of a graph connecting two vertices and (as we will see in this section), of the perfect matchings of a bipartite graph [68], of the spanning trees of a graph [59]. The flashlight method relies on checking whether there is at least a solution before recurring, i.e., checking whether there is at least a solution in (resp. ) satisfying (resp. ). This is usually called an extension problem, as we are checking while enlarging a partial solution whether this can be extended/completed to a final solution. This check is crucial for efficiency as if every partition is non-empty, i.e., all the internal nodes of the recursion tree are binary, then the number of internal nodes is bounded by the number of leaves. In addition, if we have that solving the extension problem takes polynomial time, since every leaf outputs a solution, we have that the resulting algorithm is output-polynomial. On the other hand, even if there are empty partitions, i.e., internal unary nodes in the recursion tree, if the height of tree is bounded by a polynomial in the size of the input and the partition oracle takes polynomial time, then the resulting algorithm is polynomial delay.
We will discuss more about extension problems in Section 4.5.
4.3.1 Enumerating all st-paths in a graph
We showcase the flashlight strategy through a binary partition algorithm for the problem of -path enumeration in an undirected graph . The partition schema chooses an arc incident to , and partitions the set of all the -paths into the ones including and the ones not including . The -paths including are obtained by removing all the arcs incident to , and enumerating the -paths in this new graph, denoted by . The -paths not including are obtained by removing and enumerating the -paths in the new graph, denoted by . The corresponding pseudocode is shown by Algorithm 6. It is worth observing that if the arc is badly chosen, a subproblem could not generate any solution; in particular, the set of the -paths in the graph is empty if is not reachable from , while the set of the -paths in is empty if is not reachable from . Thus, before performing the recursive call to the subproblems it could be useful to test the validity of , by testing the reachability of in these modified graphs. This will be our “flashlight” indicating which partitions lead to at least one solution. Note that the height of the corresponding recursion tree is bounded by , since at every level the size of the graph is reduced by at least one arc. The cost per iteration amounts to the reachability test, so time. Therefore, the algorithm has delay.
This problem has been studied in [63, 65, 55], and in [38], guaranteeing a linear delay. In [5], the latter algorithm has been modified in order to enumerate bubbles. In the particular case of undirected graphs, in Section 4.3.2 we will show an algorithm based on this bipartition approach having an output sensitive amortized complexity, as shown in [6]. In the particular case of shortest paths, the enumeration problem has been studied in [27]. It is worth observing that the problem of enumerating all the -paths in a graph is equivalent to the problem of enumerating all the cycles passing through a vertex.
4.3.2 Improving st-paths Enumeration: Dynamic Certificate
We can improve the -path enumeration algorithm proposed in the previous section to achieve optimal complexity, by using a dynamic certificate based on the biconnected components of the graph [6]. Let denote the set of -paths in and, for an -path , let be the number of it edges. We note that the problem of -path enumeration necessarily requires time to just list the output. The algorithm we present here is then optimal, as it lists all the -paths of in time, where is the number of edges of the input graph.
BCCs and Bead Strings
Recall that the biconnected components (BCCs) of are the inclusion-maximal biconnected subgraphs of . A BCC is called non-trivial if it is made up of at least three nodes. It is known that the BCCs of a connected graph form a tree:
Definition 6.
Let be a connected graph, and consider the following graph: add a vertex for each BCC of , and a vertex for each articulation point of . Then, add an edge between an articulation point and a BCC if and only if . The resulting graph is a tree, called the block-cut tree, or BC-tree, of .
The vertices of the BC-tree that correspond to BCCs are called graph-vertices, while the ones that correspond to articulation points are called node-vertices. By construction, the tree alternates levels of node-vertices to levels of graph-vertices. We observe that, given any two nodes of , there is a unique corresponding shortest path in the BC-tree, which we call the bead string from to , and we denote with . If the path is trivial, then and belong to the same BCC. Otherwise, the two endpoints of are distinct BCCs: one of them contains , and the other . Given , we define its head as the first biconnected component along the bead string (in other words, the component of that contains ).
In the following, we give a characterization of how -paths behave with respect to the BCCs of . Consider the bead string in the BC-tree of : its extremities (possibly equal) must contain and , respectively. Keeping this notation in mind, we can restate the following result by Birmelé et al.:
Lemma 7 (Lemma 3.2 from [6]).
All the -paths in are contained in the subgraph of corresponding to . Moreover, all the articulation points in are traversed by each of these paths.
Thus, when looking for -paths, we are actually only concerned with the subgraph of formed by the BCCs along the bead string from to in the BC-tree.
Validity Check with Bead Strings
Our aim is to improve the partition schema introduced in the previous section, by employing the structure and properties of the bead strings to perform an efficient validity test for the edge that is chosen at partition time (line 4 of Algorithm 6). More specifically, let be the instance at the current iteration; we wish to avoid choosing such that is not reachable from in (that is, ), and not reachable from in (). Indeed, for such a bad choice both partition subproblems yield no solution. In the previous section, we checked both of these conditions by performing a linear time reachability test at each iteration. We can avoid this computation by retaining a dynamic certificate based on the bead strings.
Indeed, by Lemma 7 we can immediately see that edges with are bad choices, leading to no solutions in their subproblems. On the other hand, any choice with will lead to at least a solution, since by definition of the bead string, each is such that . We call this branch of the partition the left child. Furthermore, if the head is non-trivial (that is, has at least two neighbors), then there is also always a solution in . Indeed, since is biconnected, there are always at least two -paths in the bead string. Thus, there is always an -path that traverses (which is the left child from before), and one that does not (which we call the right child): thus .
The downside of the bead string approach is that maintaining this information can be costly if done naively: computing the bead string at each step would require linear time in the worst case, giving us no advantage with respect to the previous algorithm. It is for this reason that the notion of certificate is introduced. By employing such certificates, coupled with an amortized analysis based on the fact that only the BCCs of the head need to be updated at each step, the authors of [6] were able to provide the optimal amortized complexity.
Certificate
The certificate is a compacted and augmented DFS tree of rooted at , which changes over time along with the corresponding bead string. Edges of the bead string are classified according to the tree as either tree edges (the ones belonging to the DFS tree), or back edges222Note that there are no cross edges, as the graph is undirected., and this information is used by the certificate for making certain choices. The certificate is able to perform the following operations:
-
1.
choose(): returns an edge with (leading to the left child). As mentioned before, there always exists one such edge. This function chooses as the last such neighbor of in DFS postorder. This will ensure that, the (only) tree edge leaving is returned last.
-
2.
left_update(): given , it computes in from in . This also updates and , and returns bookkeeping information so that everything can be reverted to the status before this operation when exiting the recursive call.
-
3.
right_update(): given , it computes in graph from in . As before, it updates and , and returns bookkeeping information like left update().
-
4.
restore(): reverts all information (bead string, head, certificate) to their status before the corresponding operation (left or right update).
With these notions, we can rewrite the pseudocode for -path enumeration as in Paths_certificate (Algorithm 7). Note that, since the tree edge incident to is the last one to be returned by choose(), then all remaining -paths must traverse this edge, and the recursive call at line 7 would yield no solutions. Therefore, the check at line 5 is the only necessary one to ensure that all recursive calls produce solutions.
With a careful implementation and amortized analysis, it can be shown that a certificate to retain bead strings using DFS tree information achieves optimal time for the enumeration of -paths of an input graph with edges.
4.4 Assessment of st-paths
The same BCC structure highlighed in the previous section can also be used to perform the assessment task for -paths.
In an assessment problem, we are asked to determine whether the number of solutions exceeds a given input threshold . Assessment algorithms are positioned between counting and enumeration: their performance is akin to counting (the output is only related to the number of solutions, with no listing required), while their structure is often similar to a truncated enumeration procedure. Not only can assessment be helpful in applications where we only need to guarantee a certain amount of solutions, and thus full counting is redundant (e.g., -reverse safe data structures [4, 17]), but it can be even more useful for devising efficient algorithms for #-complete counting problems. Such problems have no polynomial-time counting algorithm unless = [71], but they can still have assessment algorithms running in time polynomial in the input size and the value of the threshold . An example of this is the problem of counting the number of -paths in an undirected graph [72], which will be the topic of this section. For this problem, an assessment algorithm running in time and space was proposed in [54]. This algorithm decomposes and arranges the -paths in a tree-like structure (based on the BCCs), which is in turn used to maintain and update a combinatorial lower bound.
We start again from Lemma 7, further noting that the paths inside each of the components of can be combined independently, thus obtaining:
Corollary 8 (Corollary 2.1 of [54]).
Let be the path in between and , where are node-vertices, and are graph-vertices. Then:
| (1) |
4.4.1 Main Idea: Expanding a Structural Lower Bound
Assume that we are given a structural lower bounding function : that is, for any biconnected graph with , we have . Then, if we take
| (2) |
we obtain that, by Corollary 8, is a lower bound on the total number of -paths. At this point, if , we can output . Otherwise, we need some way to expand (1). We can do so by exploiting the following disjoint union:
| (3) |
where is the set of neighbors of node inside . Note that this can be seen as exploring the binary partition tree of the previous section in a different way: given a current source , we only consider what we called its “left children”, and we explore all of them at the same step (by considering all the neighbors of that lead to ). Note that this leads us to conceptually create “new sources” whose disjoint union covers all paths from the original source, as we now focus on -paths for each neighbor of .
The latter formula, when the lower bound is applied, is translated to
We can plug this lower bound expansion in (2), improving the total bound :
This refinement can be recursively applied again and again, by taking the reduced graph , recomputing its bead string, and applying Corollary 8 to the new bead string. In this way, we can proceed until either reaches , or when we have effectively counted all -paths, and their total number does not reach : here we can safely output .
To be employed in such a procedure, we need a structural lower bounding function, meaning that it must satisfy the following:
-
1.
Given a biconnected graph and two nodes , we have .
-
2.
If is trivial (size 2), then .
In [54], the authors prove that the function satisfies such properties, and they employ it in the subsequent implementation.
Multi Source Tree Data Structure
To efficiently retain the information concerning the nested lower bounds, the authors use a tree structure called the Multi Source Tree Data Structure (MSTDS). Such a tree has one node for each of the BCCs currently contributing to the global lower bound (that is, the ones appearing in the formula expanded so far).
The starting tree will consist of a path, given by the bead string of the original graph . The root of the tree is the BCC containing , and there is one leaf for each BCC containing a current source (that is, a node that was expanded as per (3)). Each root-to-leaf path is a bead string for one of the current sources, and their disjoint union covers all -paths of the original graph. We note that, since we start from , the structure obtained by repeatedly expanding the BCC corresponding to a source will always be a tree.333Some care must be taken to avoid repetition of BCCs, as the same node can become source multiple times, but it can be easily handled by keeping multiplicities for each current source.
Given the tree at a current step, the global bound can be computed as the sum, over all root-to-leaf paths, of the product of the components along each path.
4.4.2 The Assessment Algorithm
The assessment algorithm can now follow quite immediately: start from a single bead string, and the MSTDS given by the corresponding path. At each step, choose a leaf of with source , and expand it as per (3). Recompute the BCCs of , and add them in as a subtree in place of . Then, update the total lower bound accordingly. By keeping some information in the MSTDS nodes concerning the lower bounds of the paths, such bound update can be performed in time.
The lower bound is improved by at least one every time a source with at least two neighbors is considered. We can find a source with this property in at most linear time (we traverse its unique neighbor, and thus trivial BCC, for at most steps). Once we find such a source, and we expand the lower bound following (3), we need time to update the BCCs after the source removal. Thus, we spend linear time per non-trivial step where we update . Since we stop when the lower bound reaches , we do no more than such steps, leading to a time complexity of .
4.5 More about the Extension Problem
In Section 4.3, we introduced the so-called extension problem, which consists of determining whether a partial solution can be extended into a complete one. We have seen that this check is fundamental for the flashlight method, which ensures polynomial delay by verifying the existence of at least one solution before making a recursive call. This mechanism effectively avoids dead-ends, such as those encountered in the Bron-Kerbosch algorithm when listing maximal cliques (see Section 4.2.1).
Since these dead-ends prevent the Bron-Kerbosch algorithm from being output-sensitive, one might consider defining an extension problem to guide its recursion. Specifically, in Algorithm 5, before making the recursive call at line 6, it would be desirable to check whether there exists a maximal clique containing , using some vertices from and none from . However, this problem has been proven to be -complete [22], implying that such a check would make each recursive step intractable unless . Consequently, the Bron-Kerbosch algorithm has exponential delay, even when different pivoting strategies are considered [22].
It is important to note that the hardness of the extension problem for maximal cliques is not an isolated case [9]. In fact, Brosse et al. have shown that the extension problem is -complete for every “interesting” hereditary property. Notable examples include maximal -degenerate induced subgraphs of a chordal graph [21] and maximal Steiner subgraphs [16], the latter of which remains hard even for just three terminals.
Furthermore, it is worth emphasizing that the hardness of the extension problem does not preclude the existence of output-sensitive algorithms. In some cases, two possible approaches can be considered. First, we may focus on restricted versions of the extension problem; for instance, in the case of maximal Steiner subgraphs with three terminals, if the partial solution is already connected, the extension problem becomes polynomial. Second, alternative techniques can be employed, such as reverse search for maximal clique enumeration, as discussed later.
5 Solution Graph Techniques for Enumeration
In Section 4, we introduced partition-type enumeration algorithms. This type of algorithm simply explores a solution space by dividing it into smaller subspaces. However, when enumerating only maximal or minimal solutions, this approach usually faces hard subproblems, like extension problems. Hence, we need to study the “structure” of the solution space to obtain an efficient enumeration algorithm.
In this section, we present another approach, called the solution graph technique. Intuitively, algorithms based on this technique traverse a graph, called the solution graph, defined on the set of solutions. The vertex set of the solution graph is the set of solutions. The main difficulty of this technique concerns how to define a good edge set for the solution graph, that is, how to define the neighbourhood of each solution. Fortunately, we have several standard strategies to do so. In the following, we will introduce some popular techniques such as the reverse search method, the input restricted problem method, and proximity search.
5.1 Reverse Search
The reverse search method [3] is one of the most common ways to define a solution graph . When we enumerate all the solutions from , we perform a depth-first search on . However, if is disconnected, then we need to enumerate all connected components in . Moreover, if is cyclic, then we may need an exponential-size stack to avoid outputting duplicate solutions. Hence, to achieve space-efficient enumeration, it is desirable for to be connected and acyclic, that is, forms a tree.
A solution graph defined by the reverse search method is called the family tree. As its name suggests, the family tree is connected and acyclic. To define the tree, it is important to define the “root” and a good “parent rule” in the tree. When we traverse the tree, performing enumeration, we will start from the root and go downwards, traversing the tree edges in reverse direction from parents to children (hence, the name reverse search). Therefore, for obtaining an efficient algorithm we also need to provide an efficient procedure for finding all the “children” of each solution. In the following subsections, we will give two examples of constructing a family tree for enumeration problems. We will start by re-considering the problem of enumerating all subsets of bounded sum (as seen in Section 4.1.1) with this new approach. Then, we will move to a more involved example concerning maximal clique enumeration, finally achieving output-sensitive time complexity for this problem, contrarily to the previous algorithm of Section 4.2.1.
5.1.1 Enumeration of Subsets of Bounded Sum: Reverse Search
In Section 4.1.1, we gave an enumeration algorithm based on a partition approach (Algorithm 4). Interestingly, this algorithm can be interpreted as a reverse search-based algorithm. Let be the set of solutions on . The root of the family tree is the empty set. Let be a non-root solution with cardinality . Here, is an injection such that for each . Then, we can define the parent of as the set obtained by removing the element with largest index in : that is, . Note that is clearly in . Now, we define the family tree as follows:
| (4) |
where
We first show that the family tree is connected and acyclic. A typical approach to show this is to use a function such that
-
1.
For the root , has the minimum value among the solutions, and
-
2.
For any non-root solutions , we have .
Here, a candidate such function is the cardinality of a solution. By using such a , we can then easily obtain the following lemma:
Lemma 9.
For any solution , there is a path from to in . Moreover, is acyclic.
Proof.
For any non-root solution , . Hence, by recursively obtaining the parent of , we can find the empty set, that is, the root. Thus, there is a path from to the root. Moreover, if there is a cycle in , then there must be an edge in such that . However, this contradicts that is the parent of . Thus, the lemma holds.
Next, we show the definition of the children. Suppose that for an element , is the index of in . Then, we define the children of is as follows:
| (5) |
This definition is reasonable by the following lemma:
Lemma 10.
For any solution , .
Proof.
Let and be any pair of solutions. Then, first we show that if then . Since is a child of , for some , . Moreover, has the maximum index in . Thus, from the definition of the parent relationship, the lemma holds.
Next, we show the other direction. Assume that . Then, from the definition of , contains an element that has a larger index than any element in . Thus, from the definition of the parent, .
From the above discussion, we can give another proof for the existence of an enumeration algorithm for this problem with time per solution, given by the traversal of this family tree.
5.1.2 Maximal Clique Enumeration via Reverse Search
Next, we give a (slightly) more complicated example. Recall that a clique of a graph is a vertex subset of the graph such that for any pair of vertices in , there is an edge in . Moreover, a clique is maximal if there is no clique such that . Several enumeration algorithms for maximal cliques can be found in the literature [50, 40, 2]. In this section, we explain the algorithm based on reverse search by Makino and Uno [50]. As in the previous section, we need to define the root and the parent-child relation of the family tree.
First, we define the root of the family tree. Suppose that is a totally ordered set such that for any , . For two maximal cliques , we say that is lexicographically smaller than if the vertex with minimum index in belongs to , and write . Then, let be the maximal clique such that it is the lexicographically smallest among all the maximal cliques in . We say a maximal clique is the root if the clique is equal to .
Next, we define the parent of a clique . For an integer , let , and let be the lexicographically smallest maximal clique containing . Note that may be equal to .
Thus, we define the parent index of a maximal clique as the maximum index such that . Consequently we define the parent of as , where is the parent index of .
Clearly, holds, and there can be no cycles. It also follows that every maximal clique will have a parent index, and thus a parent, other than . Thus, the parent-child relationship derives a tree structure rooted at .
Computing the parent of a maximal clique is easy. However, obtaining all the children of a clique is not obvious. The crucial observation for child enumeration is the following: Let be a child of ; then, for any larger than or equal to the parent index, . Hence, children can be found by removing from a vertex and vertices having larger indices than , then adding a vertex (possibly ) to the remaining of , and greedily adding vertices to obtain a maximal clique. By this construction, is the vertex with the parent index in the resultant clique. In the following, we give more details. Given a maximal clique and an index , let
can be interpreted as follows: we first pick the vertices adjacent to in , and then greedily add vertices to . Therefore, the following lemma holds.
Lemma 11 ([50, Lemma 2]).
Let and be maximal cliques in . Then is a child of if and only if hold for some such that
-
1.
-
2.
is larger than the parent index of
-
3.
-
4.
Moreover, if an index satisfies the above four conditions, then is the parent index of .
From the above lemma, we can obtain the children of in polynomial time, yielding a polynomial delay enumeration algorithm for maximal cliques.
5.2 Input-restricted Problem
While reverse search is a powerful tool, one of its drawbacks is the complexity involved in defining a working “parent-child” relationship. However, there actually exists a simple and general way to obtain a solution graph for a large class of problems, which in some cases immediately yields efficient enumeration algorithms for maximal solutions. This method is typically referred to as the Input-Restricted Problem, or hereafter IRP.
5.2.1 Designing the algorithm
The key intuition can be already found in Lawler et al. [49], generalizing algorithms by Paull and Unger [53] and by Tsukiyama et al. [67]. The idea given is the following: given a maximal solution of a given enumeration problem and some element , the hardness of listing solutions maximal within is linked to the hardness of listing maximal solutions of the general problem.
This has been then formalized as the Input-Restricted problem by Cohen et al. [15], who also restrict their attention to the enumeration of maximal solutions in graphs (although the technique can potentially be applied to any enumeration problem where solutions are maximal sets of elements). Formally:
Definition 12 (Input-Restricted Problem).
Let be a graph and a property (such as being a clique, or a path). Let be a maximal solution, i.e., an inclusion-maximal set of nodes respecting , and any node in .
The Input-Restricted Problem asks to enumerate all maximal solutions of in the induced subgraph .444The definition is identical for problems where solutions are sets of edges, simply requiring to be a set of edges and to be an edge.
Intuitively, we can use a maximal solution of to generate a new maximal solution of . Whenever this happens, we say there is an edge from to in the solution graph (optionally labelled with ). Another significant upside of the technique is that can be any arbitrary solution that includes . Observe that itself is always a maximal solution of , but we typically avoid returning it as it would only add a futile edge from to .
The catch is the following: this technique assumes that the property at hand is hereditary, i.e., if is a solution, then any subset of must itself be a solution. Even if many properties respect this condition (e.g., cliques and most of their relaxations, independent sets, forests), not all properties do, and the IRP technique may not be the most suitable for the latter (e.g., cycles and spanning trees).
If the property is hereditary, then we can easily define a simple maximalization procedure , which iteratively adds an arbitrary element of to until it is no longer possible to do so without breaking . With this, we have all the elements for building an enumeration algorithm, which is a DFS-like traversal of the solution graph (see Algorithm 8).
The main advantage of the technique is now clear: if we simply plug in a solver for the input-restricted problem , the algorithm will enumerate all maximal solutions to the general problem.
The completeness of this algorithm for hereditary properties relies on the idea that the solution graph generated with the IRP is strongly connected, thus a traversal from any starting point will visit every solution. Proving this is actually quite simple. It relies on showing that if you start at and want to reach , there is always a and a solution of , such that has a larger intersection with than had: applying this repeatedly must inevitably yield in a finite amount of steps. More detailed proofs can be found in [15].
Let us then discuss the drawbacks of this approach, and how some of them can be solved.
Firstly, checking that “ was not already found” is necessary to avoid duplication, but it is no trivial task. A way to do this is to store every solution found so far in a dictionary, and use it to check whether is new. This is efficient time-wise (a trie or a block tree can answer the query in linear time), but as there can be exponentially many solutions, it makes the space usage exponential. However, Cohen et al. [15] show that, for hereditary properties, we can avoid a dictionary and induce a tree-like structure based on solving the problem incrementally. Essentially, they use the solutions of to generate those of , much like Tsukiyama et al. [67], in a DFS-like fashion. This alternative algorithm has the advantage of reducing space usage, but does not operate anymore on the solution graph, as the actual maximal solutions of correspond only to the leaves of this tree-like structure.
It is thus worth noting that Algorithm 8 has a striking similarity to reverse search, and indeed it is also possible to achieve polynomial space by inducing a parent-child relationship, identifying a parent and a parent index for in the same way as Section 5.1, and turning the condition in Line 8 to “if the parent of is , and the parent index is ”. In this light, the Input-Restricted Problem becomes a powerful and general tool for building reverse search algorithms.
5.2.2 Supported classes of properties
As discussed above, the Input-Restricted Problem is easy to deal with if the property at hand is hereditary (i.e., subsets of solutions are solutions). However, Cohen et al. [15] generalizes the proof of correctness of Algorithm 8 to the more general class of connected-hereditary graph properties, i.e., properties for which any connected subgraph of a solution is itself a solution.
To give an example, induced forests are a hereditary property: if a set of nodes induces a forest, any subset of induces a forest. A connected-hereditary property are instead induced trees: if induces a tree, a subset of induces a forest, but a subset of that induces a connected subgraph will induce a tree. Inducing a parent-child relationship in this class is more challenging, and Cohen et al. [15] rely on a solution dictionary for avoiding duplication. However, Conte et al. [20] proposes a general technique that allows the definition of a parent-child relationship within the structure of Algorithm 8 even for connected-hereditary properties,555Technically the class is the more general set of “strongly accessible and commutable” properties, whose definition can be found in [20], although most naturally defined properties in the class will typically fall inside connected-hereditary. thus allowing enumeration in polynomial space.
Finally, note that properties like induced cycles are neither hereditary nor connected-hereditary, as a subset of a cycle will never be a cycle; properties of this kind cannot be solved directly by Algorithm 8.
5.2.3 Complexity of the Algorithm and the IRP
The final piece of the puzzle is the Input-Restricted Problem itself.
It is relatively easy to see that, in Algorithm 8, the complexity of the essentially corresponds to the amount of work done per solution: every time a new solution is found, we perform up to times (for each different ), and for each solutions returned we just need to apply . This means that the cost-per-solution of the algorithm is just a polynomial multiplied by the cost of the IRP666The polynomial of course varies according to whether we can limit the number of considered and the cost of complete, but it is always polynomial for hereditary and connected-hereditary properties, as long as we can identify in polynomial time wherher is a solution or not, as we can then simply iteratively test every node for addition until none is possible.. While intuitively a simpler task, the IRP it does not always allow efficient resolution: the number of maximal solutions within the IRP can greatly vary from problem to problem, leading to different complexities.
One of the simpler cases are maximal cliques. Given a maximal clique and a node , has just two maximal cliques:
-
itself ( cannot be added to it, as was already maximal in )
-
, i.e., and all its neighbors in (any other element of not adjacent to cannot extend the clique)
As it has only two solutions (and only one we care about, since was already known), for maximal cliques can be solved in polynomial time. This means the total amount of work done by Algorithm 8 per solution is polynomial, i.e., we achieve polynomial amortized time.777In fact, we can turn this into polynomial delay with alternative output[69], that simply makes us output at the beginning of if the recursion depth is even, and at the end of if the recursion depth is odd. This avoids long chains of recursive calls being closed (or opened) without new outputs being produced.
This is not always the case: the Input-Restricted Problem may have an exponentially high number of solutions, meaning it cannot be solved in polynomial time. Even in this case, not all hope is lost: some properties, like Maximal Connected Induced Bipartite Subgraphs [73] or Maximal Temporal Cliques in a restricted class of temporal graphs [11] allow the IRP to be solved in polynomial delay.
More in general, we can say that whenever the IRP can be solved in Incremental Polynomial Time (which includes polynomial delay), then Algorithm 8 also runs in Incremental Polynomial Time. Intuitively, this can be proved by the fact that, if solutions were found so far, the time used by the IRP to find solutions that we already knew is polynomial in and the input size (e.g., )), and since we can apply the IRP only to the solutions found so far, the time required until a new solution is found is bounded by , which is also polynomial in and .
Finally, there are cases where even the IRP simply cannot be solved efficiently: Lawler [49], for example, shows how to reduce an instance of SAT to a simple hereditary property on a set of elements, where maximal solutions can be easily identified in polynomial time, and another solution exists if-and-only-if the input formula is satisfiable. It follows that no output-sensitive algorithm may exist for this problem, unless . This tells us that even an efficient solution to the IRP would imply , as we could use it in Algorithm 8 to obtain an efficient algorithm for the problem.
5.3 Proximity Search
We have seen above how the Input-Restricted Problem is a powerful tool for designing enumeration algorithms, but sometimes it cannot provide an efficient solution. It is natural to ask: can we go beyond it, and obtain polynomial delay even when the IRP has exponentially many solutions?
The answer is “sometimes yes”, and one of the techniques to do this is called Proximity Search [23]. The basic idea is simple: if the number of solutions to the input-restricted problem is exponential, this corresponds to having an exponential out-degree in the solution graph; we then want to ignore some of the edges, and still prove that the solution graph is strongly connected.
The resulting algorithm is identical in structure to Algorithm 8, with the only difference being some specific function in place of to produce new solutions (i.e., out-neighbors of ). The complexity relies on finding an efficient function and proving that the resulting solution graph is strongly connected.
5.3.1 Completeness
To prove strong connectivity we want to say that, starting from any solution , and traversing edges of the solution graph, we will eventually find a path to any other solution . The Input-Restricted Problem strategy, on hereditary properties, relies on intersection: for any pair , there exists a for which finds an such that , and this tells us is the next step in the path towards .
A key step of proximity search is replacing the monotone increase in intersection with a weaker, more general, and possibly problem-specific condition, that is called proximity, and denoted as . In essence, a proximity search algorithm requires:
-
An arbitrary starting solution
-
A function to determine proximity that, for any fixed , is maximized when and only when .888Note that we don’t care about the efficiency of this function, as it is used in the proofs and never in the algorithm!
-
A function such that, for any other solution , there is for which .
The completeness then follows from the same logic above, as always has an that gets “one step closer” to . By the same logic as Algorithm 8, if takes only polynomial time, then the enumeration has polynomial delay.
Just like Algorithm 8, we also need to check whether any solution was already found, to ignore duplicates, and that means again storing all solutions in a dictionary and use exponential space. We’ll see later how it is sometimes possible to overcome this.
However, rather than solving the problem, we now simply formalized the requirement. Next we introduce a useful technique to actually design proximity search algorithms.
5.3.2 Canonical reconstruction
Canonical reconstruction, formalized in [19], is an effective method to implement proximity search.
The key ingredient is defining, for each solution , a solution order . This order will be problem-specific, and ideally allows us to exploit the problems’ structure to our advantage. Then, canonical reconstruction defines proximity as follows:
Definition 13 (Proximity in canonical reconstruction).
Given two solutions and , let be the solution order of . Then is defined as the elements in the longest prefix of this order that is fully contained in .
A couple observations are in order:
-
, as contains all elements in its own solution order.
-
, and in particular if , then . Indeed may contain several elements of , but we only care about those who align into a prefix of the solution order of .
-
, since by definition will instead consider the longest prefix of the solution order of that is fully contained in .
Finally, given and with , we call the canonical extender for the pair , that is, the earliest element in the solution order of that is missing from .
The directive, in canonical reconstruction, is finding a maximal solution in that contains . may very well have a smaller intersection with than , but its proximity will have increased by at least one.
This is by no means a conclusive guide as, again, canonical reconstruction relies on finding a problem-specific solution order and exploiting its structure. However, many graph properties are associated with specific decompositions that induce an order, and this often turns out to be a powerful way to exploit these decompositions.
For completeness, to showcase the technique, we give here the example of Maximal Induced Chordal Subgraphs. More examples can be found in [19].
5.3.3 Maximal Induced Chordal Subgraphs
A graph is chordal if it has no induced cycle longer than , i.e., any longer cycle has a shortcut edge (chord). Here we want to use canonical reconstruction to enumerate all maximal such that is chordal.
Observe that this is a hereditary property, since any induced subgraph of a chordal subgraph graph cannot introduce a new induced cycle: any shortcut edge in would be preserved in the induced subgraph. Moreover, as shown in [19], the Input-Restricted Problem for this property can have exponentially many solutions, so it cannot be used to achieve polynomial delay.
We will use two key properties of chordal graphs:
-
A chordal graph has maximal cliques.
-
A graph is chordal if and only if it allows a perfect elimination order.
A perfect elimination order is an elimination order of the vertices obtained by iteratively removing simplicial vertices, where a vertex is simplicial if its neighbors form a clique.
Given a graph , we define its solution order as a reversed perfect elimination order, that is, an order where, for each , the neighbors of preceding it in the order () form a clique.
Now, take any pair of solutions , where is the solution order of , and let . As described above, we call the canonical extender for , and our goal will be, given , to produce a solution containing . It is crucial to observe that we do not know . The function must be able to satisfy such a condition for any of the (exponentially many) other solutions , even though must take polynomial time, so it is only allowed to compute polynomially many solutions. The good news is that we can simply try to use every vertex in as canonical extender, so eventually we will always consider the correct .
This is where the structure of the problem unravels: we know that , and crucially, we designed the solution order so that is a clique, meaning that is contained in a maximal clique of ; let this clique be . With knowledge of and , we can produce a suitable as follows:
Observe that must contain all of because the vertices that were not neighbors of were never removed from , and the vertices that were neighbors of were added by including . Finally, applying the complete function gives us a maximal solution, but does not remove any vertex.
Moreover, observe that is chordal: is a subset of , so it is chordal because this is a hereditary property, and the further addition of does not break chordality, because is simplicial, so allows a perfect elimination ordering starting from . By definition, applying the complete function does not break chordality.
The final piece of the puzzle is the fact that, while we do not know , we know that has only maximal cliques, so we can indeed try all possible combinations of and , obtaining just solutions.
Specifically, will produce these solutions:
-
For each
-
For each maximal clique of
-
Return
From what observed above, for any possible choice of , there will be one correct choice of and such that the obtained has . Again, observe how there are exponentially many possible , and yet a polynomial number of is sufficient to satisfy the condition.
The result is a polynomial-delay algorithm for enumerating Maximal Induced Chordal Subgraphs.
5.3.4 Extensions and alternatives
Canonical reconstruction has one more advantage: [19] shows that it is sometimes possible to induce a parent-child relationship by adapting the parent-child relationship designed in [20] on connected-hereditary (commutable) properties. We remind the reader to [19] for the details, but it is remarkably easy to determine when this is applicable: as a rule-of-thumb, it is possible to induce a parent-child relationship and perform proximity search in polynomial space when the solution order of canonical reconstruction is defined via a breadth-first search.
This is not the case for the example shown above of chordal subgraphs, which used a reversed perfect elimination order. However, [19] shows how to apply it to several other problems such as bipartite subgraphs, induced trees, and connected acyclic subgraphs.
Moreover, [10] actually showed how to compute a suitable solution order for chordal subgraphs in a BFS-like fashion, and thus produced a proximity search algorithm for maximal chordal subgraphs with polynomial delay and polynomial space.
Finally, we remark how techniques for navigating a solution graph can get arbitrarily interesting and more complex. A great example are retaliation-free paths by Cao [12]: in essence, while canonical reconstruction finds a path from to by adding the canonical extender without losing , retaliation-free paths work on a similar logic but with a key difference: they consider a solution order for , but allow the addition of to remove elements from , as long as we can prove that, along the path, the elements sacrificed will be added again without “retaliation”, i.e., without removing again. Proving the existence of retaliation-free paths is not straightforward, but [12] successfully uses them to achieve polynomial delay on multiple enumeration problems for which the input-restricted problem is not solvable in polynomial time.
6 Amortization analysis
In Section 4, we presented techniques for designing efficient enumeration algorithms, based on the partition technique. Algorithms designed using this approach typically achieve polynomial delay and polynomial space. However, when aiming for more efficient algorithms, such as those with (sub-)linear delay or average (sub-)linear delay, a more refined algorithm design is required. In algorithms based on the partition technique, the bottleneck is often the procedure of dividing the problem into subproblems. A straightforward way to improve efficiency is to reduce the cost of this partition procedure. However, achieving (sub-)linear time improvements is challenging.
One example is enumeration problems related to connectivity, such as -path enumeration. In this problem, partition often involves vertex or edge deletions. After deleting some vertices or edges, determining whether and remain in the same connected component is known as the dynamic connectivity problem. Solving this problem in time is non-trivial, where and are the number of vertices and edges, respectively.
To reduce the cost of partitioning, amortized analysis can be effectively applied in algorithm design [46, 47, 18]. Typically, the partition method generates at most two subproblems. However, for certain problems, it is possible to generate more subproblems with nearly the same cost. In such cases, if it can be shown that the cost is proportional to the number of generated subproblems, the average delay can be improved. In enumeration algorithms, rather than focusing solely on reducing the cost for each partition, designing algorithms that consider the relationship between the number of solutions and the cost often leads to simple and efficient algorithms. In this section, we give several such examples.
When discussing algorithms with delay or average delay, as in this section, it is common to consider preprocessing time separately. This is because, for many problems, achieving delay is impossible for inputs with no solutions, leading to special treatment of preprocessing time. In the following algorithms, it is easy to show that the preprocessing time is bounded by time. Thus, we omit the analysis of the preprocessing time.
The correctness of the algorithms presented in this section immediately follows from the partition technique. Thus, we also omit the proof of correctness of the algorithms.
6.1 Amortization by Children and Grandchildren
A typical technique in amortized analysis is to analyze computation time based on the number of children or grandchildren. If the computation time at each recursive call is time, where is the set of children of and is some function, then the total computation time can be bounded by time. Therefore, designing an algorithm such that its computation time depends on the number of children and grandchildren can often lead to improvements in overall efficiency.
6.1.1 Enumerating all paths starting from
We consider the problem of enumerating paths with a specified starting vertex but no specified end vertex. For this problem, we can obtain an average constant-time algorithm by using a simple partition approach. We give our algorithm in Algorithm 9. Since Algorithm 9 is a recursive algorithm, it generates a tree structure. Hereafter, we denote it as and we call a vertex in a recursive call.
In this algorithm, each recursive call runs in time and outputs one path starting from , where is the maximum degree of the graph. Therefore, an immediate analysis shows that the average time complexity is time. However, using amortized analysis, we can show that this algorithm actually achieves average constant delay.
For each recursive call , the running time of is . More precisely, runs in time, where is the other endpoint of a current path . On the other hand, the number of children of is also . Using this observation, we analyze the total running time of the algorithm as follows: , where is the set of children of a recursive call . Since the number of recursive calls and the number of paths starting from are equivalent, the average time complexity is thus average constant delay.
6.1.2 Enumerating all trees containing
As a next example, we propose the enumeration of all trees containing a specified vertex , as shown in Algorithm 10. In this algorithm, the time complexity of each recursive call is time. More precisely, the running time is time. Although the notation is used in the algorithm, its definition is as follows. For an edge , we denote with the graph obtained by contracting edge , that is, by unifying vertices and into one. For a set of edges , we write to denote in short . Since the number of recursive calls and the number of solutions are equivalent, the average time complexity boils down to time in a straightforward analysis.
We again improve the average time complexity using an amortized analysis. In Algorithm 10, the number of children of a recursive call is the degree of a given vertex . Let be the set of children of . When we generate , contracting an edge , we need time. On the other hand, the number of children of is at least . In other words, the computation time required to generate is bounded by the number of children of . The time complexity of each recursive call is bounded by , where is the set of grandchildren of . Hence, holds and the average time complexity is average constant delay.
6.2 Push out amortization
In the previous subsections, we improved the average delay of some algorithms by considering the relationship between the number of children and grandchildren, and the computation time of each recursive call. In principle, this method can be generalized to account for all descendants. However, analyzing the total number of descendants and their computation time makes the analysis extremely complex. To address this, we introduce a technique called push out amortization [70]. In push out amortization, if all recursive calls satisfy a condition known as the PO condition, we can distribute the computational cost proportionately across all descendants. As a result, a better upper bound can be given. The definition of the PO condition is as follows: , where is the computational time of a recursive call , is an upper bound of the computation time of a leaf recursive call, and and are constants that do not depend on the input. If all recursive calls satisfy the PO condition, the average delay of the algorithm is .
Intuitively, if the PO condition holds at every internal node, it indicates that the computation time for the entire recursion tree is dominated by the computation time of its leaves. Typically, the number of leaves and the number of solutions is equivalent, and thus the average delay is bounded by time.
Informally speaking, one of the advantages of push out amortization is that, when aiming for constant average-delay, a linear time computation in each recursive call is often not a concern. In contrast, when using amortized analysis based on the number of children or grandchildren, proving constant average-delay becomes challenging if each recursive call takes linear time.
In the following, we provide analysis using push out amortization while examining several concrete examples.
6.2.1 Matching Enumeration
Algorithm 11 shows a simple partition-based matching enumeration algorithm. In this algorithm, each recursive call requires time. We note that, if the maximum degree is at most , each recursive call can be done in constant time. Thus, in what follows, we consider recursive calls with the maximum degree more than .
From the definition of the big notation, for each recursive call , the computation time is at most for some constant . Hereafter, we assume that . This modification does not increase the time complexity of Algorithm 11. The purpose of this assumption is to simplify the analysis of a lower bound computation time. In push out amortization, a lower bound of computation time is crucial in the context of the PO condition.
We show that the tree structure generated by Algorithm 11 satisfies the PO condition. Let be a recursive call and be the set of children of . Since each recursive call demands time, the sum of the computation time of children is for some constant . We denote the child that receives as the input graph as . Since the number of children is , we obtain for some constant by setting since and . We consider the remaining part of the PO condition, . We show that . Let be an edge in . Since , has an edge such that is a matching. This implies that has a child that receives a graph that has . Therefore, holds and the PO condition holds for all recursive calls. Since is constant, push out amortization proves that Algorithm 11 runs in constant average delay.
6.2.2 Enumerating connectivity elimination orders
In the previous example, we were able to enumerate matchings in average constant delay, even when each recursive call required linear time. By using push out amortization, we can show that an algorithm can still run in constant average delay even when the computation time of each recursive call exceeds linear time, as long as the instance size received by the children is sufficiently large.
An order of vertices is a connected elimination ordering if for any , is connected. Let us consider the connected elimination order algorithm presented in Algorithm 12. We note that, for any connected graph, there are at least two vertices and such that and are connected (as chosen in Line 2). This can be shown using the BC-tree (as defined in Section 4.3.2): indeed, for any tree, removing a leaf does not break the connectivity.
Each recursive call of Algorithm 12 can be performed in linear time, using a linear time biconnected component decomposition algorithm. Still, such a simple algorithm demands time for each recursive call. We show that even if each recursion takes time, Algorithm 12 runs in constant average delay using push out amortization.
Let be a recursive call and and be the children of . From the definition of and , these recursive calls receive a graph with vertices. Thus, for some constant . Since , . By setting and assuming , and the PO condition hold. Therefore, this algorithm runs in constant average delay as well.
6.2.3 Enumerating all perfect elimination orderings
As another example of an enumeration problem related to ordering, we consider the Perfect Elimination Ordering (PEO) enumeration problem (for fundamental facts about chordal graphs, we refer the reader to [7]). As mentioned in Section 5.3.3, if a graph is chordal, that is, has no induced cycle with length at least four, then has an ordering such that for any , is simplicial in , called a perfect elimination ordering. It is known that a chordal graph has at least two simplicial vertices [7]. Therefore, the procedure shown in Algorithm 13 correctly works.
Enumeration of simplicial vertices can be done in time. Since the number of simplicial vertices (and thus of children of a recursive call) is at least , we can again achieve average constant delay by using push out amortization. Indeed, in each recursive call the computational cost equals , given by simplicial vertices enumeration, where is some constant; since holds, the PO condition holds as well.
6.2.4 Spanning tree enumeration
Finally, we introduce an algorithm that takes advantage of the ability to use linear time in each recursive call. Specifically, we present an amortized constant-time algorithm for enumerating spanning trees.
In spanning tree enumeration, we partition the spanning trees into the ones containing an edge , and the ones that do not contain . Note that if is a bridge of (i.e., its removal disconnects the graph), then all spanning trees contain . In such a case, the number of subproblems is just one, and it is thus not easy to satisfy the PO condition.
Since we want to avoid this situation, we can enumerate all bridges, which takes linear time using Tarjan’s bridge enumeration algorithm [64], and contract them. Moreover, to further simplify the algorithm, hereafter, we assume that can have parallel edges due to edge contraction (that is, it is a multigraph), and we keep as an invariant that it has no bridges, that is, it is two-edge connected. Additionally, when has edges parallel to , we can generate subproblems easily. We denote that the set of edges between and as .
In our algorithm, to ensure a sufficient number of children, if is the set of all bridges of , we generate children. For each edge , we consider the following subproblem: enumerating all spanning trees in . That is, we consider the graph where edge was removed, and here contract all the edges in except for . Such subproblems can be generated in linear time; see Algorithm 14 for the details. Thus, the time complexity of each recursive call is time. Finally, we show that this algorithm satisfies the PO condition. When we contract an edge , the number of edges in is reduced by . However, since the number of children is at least , we can amortize this cost using the factor in the PO condition. Similarly, each recursive call has at least children. Finally, since each edge in is not contained in and is contained in at least two subproblems, this algorithm satisfies the PO condition and runs in constant average delay.
6.3 Geometric amortization: The gap between delay and average delay
In the analysis of average delay, we often encounter unintuitive results due to the exponential number of solutions compared to the input size. This arises because the majority of the recursive calls are performed over inputs of constant size. When improving the average delay, focusing on the relationship between the number of solutions and the computation time often leads to tighter upper bounds on the overall time complexity. However, this approach does not easily extend to delay. Indeed, the class of polynomial delay solvable problems and the class of average polynomial delay solvable problems are distinct under , although the separation is slightly artificial. For more about this, see Proposition 3.10 from [58].
To bridge the gap between these two classes, the class of problems solvable with linear incremental delay is proposed as an intermediate complexity class. An algorithm runs in linear incremental delay if the time required to output the -th solution is bounded by . We introduce here geometric amortization [14], which allows us to prove an important result: The class of problems that can be solved with incremental polynomial delay and polynomial space is equivalent to the class of problems solvable with polynomial delay and polynomial space. The key idea behind geometric amortization is to run multiple instances of the incremental polynomial delay algorithm in parallel on multiple machines. By carefully scheduling these machines, we can ensure that solutions are output in polynomial delay and polynomial space. Capelli and Strozecki published a demo illustrating this idea, which may help in understanding the core concept [13, 14]. Since the space usage only increases proportionally to the number of parallel machines, we can achieve polynomial delay while maintaining polynomial space complexity.
One of the applications of geometric amortization is converting enumeration algorithms with cardinality constraints to -best or ranked enumeration. Ranked enumeration is the problem of outputting solutions in order of their objective function values from best to worst [35]. A typical technique to design ranked enumeration algorithms is to combine partition techniques with optimization algorithms [48]. This approach typically demands exponential space. However, by using geometric amortization, we can obtain polynomial-delay and polynomial-space ranked enumeration algorithms, if we have an incremental polynomial-delay enumeration algorithm with cardinality constraints [45].
7 Hardness of Enumeration Problems
In the previous sections, we have focused on methods for constructing efficient enumeration algorithms. However, just as with ordinary search problems, there is no guarantee that an efficient algorithm always exists. In this section, we give a brief introduction to how to deal with the “hardness” of enumeration algorithms. Moreover, we also discuss fixed-parameter tractability and approximation. This topic is currently an active area of research [30].
One of the most important open questions in the enumeration field is whether there exists an output polynomial time algorithm for all minimal dominating set enumeration (Dom-Enum for short). A dominating set of a graph is a set of vertices such that each node of the graph is either in , or is a neighbor of a node of . Although such algorithms have been proven to exist for many special cases, including bipartite graphs, the problem remains open for general graphs, comparability graphs, and unit-disk graphs. As in the case of showing the -completeness of a decision problem, if we can solve an enumeration problem that is “equivalent” to Dom-Enum, then we can also solve Dom-Enum. In that sense, introducing the notion of reduction is crucial. Currently, the following notion is adopted:
Definition 14 (E.g. [61]).
Let and be two enumeration problems. We say that there is a polynomial-time delay reduction from to if there exist polynomial-time computable functions and such that:
-
1.
is a mapping from an instance of to a set of instances of .
-
2.
is a mapping from a solution of to a set of solutions of .
-
3.
.
-
4.
.
-
5.
For any ,
By using such reductions, it has been shown that Dom-Enum is equivalent to the enumeration of minimal transversals999A hypergraph transversal is a set of vertices that intersects every hyperedge (Hyper-Trans, for short) in the sense that if one can be solved in polynomial delay, then the other also can be also solved in polynomial delay [39]. Furthermore, it is also known that the dualization of monotone boolean functions (Dualization, for short) is equivalent to Hyper-Trans. In other words, problems that arise in different contexts – such as graphs, set families, and logical formulas – turn out to be equivalent to each other.
7.1 NP-hard Enumeration Problems
Although it is unknown whether Dualization can be solved in output polynomial time, there is a quasi-polynomial time algorithm for Dualization [33]. Do harder enumeration problems exist? One obvious example is enumeration of the solutions of an -hard problem, e.g., the enumeration of minimum solutions of Vertex Cover.
Another example is the problem of enumerating all minimal dicuts (Dicuts, for short). Let be a strongly connected directed graph. A set of edges is called a directed cut or dicut if removing from destroys its strong connectivity. A dicut is minimal if there is no proper subset of such that is a dicut. It is obvious that finding one minimal dicut is easy. However, enumerating more minimal dicuts is not trivial. Indeed, Dicuts is known to be -hard [41]. More precisely, given a family of minimal dicuts for , deciding whether there is a minimal dicut not contained in is -complete. Similarly to Dicuts, enumeration of inclusion-wise minimal separators, a minimal set of edges that satisfies a certain connectivity requirement, in a directed graph is also -hard [8].
Techniques for proving -hardness of an enumeration problem often involve constructing an instance of that corresponds to an instance of some -complete problem. Typically, obtaining several solutions of is easy. Thus, we design such that it contains only a polynomial number of “trivial” solutions, and if an enumeration algorithm finds a non-trivial solution, then this implies that has a feasible solution. With this setup, -hardness can often be established.
7.2 Introduction to parameterized enumeration
In the previous section, we talked about problems for which finding a single solution is easy, but enumerating all solutions is difficult. In this section and the next, we introduce two approaches to deal with problems for which it is difficult even to obtain one solution: parameterized enumeration and approximate enumeration.
Parameterized enumeration is an approach where, in addition to the usual enumeration problem, we also take into account a parameter related to the problem. An example is the following: output all minimal vertex covers (i.e., an inclusion-minimal set of vertices such that all edges of the graph have at least one endpoint in the cover) of size at most , treating as a parameter. As shown in later, we can “efficiently” solve this problem. Now, we first need to define the efficiency of enumeration for a problem with a parameter. In the same way that we define the class for parameterized search problems, we define for parameterized enumeration problems, as follows:
Definition 15 (, [26]).
Let be an enumeration problem with parameter . Then, an enumeration algorithm for is a -algorithm if there exists a computable function and a polynomial such that for every instance , outputs all the solutions of with delay at most .
In the same work, we can also find the definitions of and [26].
Meier’s Habilitation thesis [51] studies the topic of parameterized enumeration in detail. Subsequently, Golovach et al. [36] introduced the notion of a fully-polynomial enumeration kernel. Intuitively, an enumeration kernel is a framework in which, given an instance of a parameterized enumeration problem, we transform it into a smaller instance with respect to the parameter, enumerate the solutions of this smaller instance, and then reconstruct the solutions for the original instance from those solutions. The algorithm that performs the transformation is called the kernelization algorithm , and the algorithm that reconstructs solutions for the original instance is called the solution-lift algorithm .
Definition 16.
For an enumeration problem with parameter , if and satisfy the following conditions (1) and (2), the pair is called a fully-polynomial enumeration kernel.
-
1.
For every instance , outputs in time polynomial in an instance of with parameter such that for some computable function .
-
2.
For every solution of , outputs with delay polynomial in a set of solutions such that is a partition of .
Interestingly, the concept of a fully-polynomial enumeration kernel characterizes the existence of a polynomial-delay algorithm. More specifically, Golovach et al. [36] proved that there is a polynomial-delay algorithm for an enumeration problem if and only if there is a fully-polynomial enumeration kernel for the problem whose the size of the output instance is constant.
7.3 Introduction to approximate enumeration
Among -hard enumeration problems, the most obviously -hard ones are the problems where even finding one solution is already -hard, for instance the enumeration of subgraphs with cardinality constraints, such as -best or ranked enumeration problems. For this problems, enumeration is trivially -hard.
In the field of optimization algorithms, in such cases, we often develop algorithms that allow for some error in the output, or permit an exponential computation time with respect to a certain parameter. As mentioned in the previous section, several approaches, such as FPT delay algorithms, have been proposed [24, 25]. In this section, we introduce approximate enumeration algorithms, which constitute an alternative approach for addressing -hard enumeration problems. To the best of authors’ knowledge, the paper [29] is the first to apply an approximation algorithm approach to enumeration problems, including those involving -hard optimization problems. This approach is an approximate version of ranked enumeration. Let be an objective function, and let us consider a minimization problem: that is, we want to enumerate all solutions in order such that for any . An order of solutions is called -approximate order if for any , . Additionally, Kobayashi et al. have proposed a relaxation based on an objective value rather than on an ordering [44]. These algorithms are called -approximation algorithms. If we allow such a relaxation, we can find one solution in polynomial time.
Research on algorithms in this area is also largely based on two approaches: partition techniques and solution graph-based algorithms [1, 42]. In solution graph-based algorithms, enumeration problems that add objective function constraints to input-restricted problems play a crucial role, similar to maximal solution enumeration algorithms [44].
8 Conclusions
In this work, we provided a survey of the relatively recent but ever growing topic of output-sensitive enumeration algorithm design for graph problems. We have presented the most common techniques of the area, mostly divided between partition-type algorithms and solution graph-based techniques. We have also provided notions of amortization analysis, showing how a careful study of the structure of the algorithm can lead to low average delay. Finally, we have introduced the concept of hardness for enumeration problems, with some preliminary notions of parameterized and approximate enumeration.
References
- [1] Zahi Ajami and Sara Cohen. Enumerating minimal weight set covers. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 518–529, 2019. doi:10.1109/ICDE.2019.00053.
- [2] Eralp A. Akkoyunlu. The enumeration of maximal cliques of large graphs. SIAM J. Comput., 2(1):1–6, 1973. doi:10.1137/0202001.
- [3] David Avis and Komei Fukuda. Reverse search for enumeration. Discrete Applied Mathematics, 65(1-3):21–46, March 1996. doi:10.1016/0166-218x(95)00026-n.
- [4] Giulia Bernardini, Huiping Chen, Gabriele Fici, Grigorios Loukides, and Solon P. Pissis. Reverse-safe data structures for text indexing. In 2020 Proceedings of the Twenty-Second Workshop on Algorithm Engineering and Experiments (ALENEX), pages 199–213. SIAM, 2020. doi:10.1137/1.9781611976007.16.
- [5] Etienne Birmelé, Pierluigi Crescenzi, Rui A. Ferreira, Roberto Grossi, Vincent Lacroix, Andrea Marino, Nadia Pisanti, Gustavo Akio Tominaga Sacomoto, and Marie-France Sagot. Efficient bubble enumeration in directed graphs. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, pages 118–129, 2012. doi:10.1007/978-3-642-34109-0_13.
- [6] Etienne Birmelé, Rui A. Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, Romeo Rizzi, and Gustavo Sacomoto. Optimal listing of cycles and st-paths in undirected graphs. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, pages 1884–1896, 2013. doi:10.1137/1.9781611973105.134.
- [7] Jean R. S. Blair and Barry Peyton. An introduction to chordal graphs and clique trees. In Alan George, John R. Gilbert, and Joseph W. H. Liu, editors, Graph Theory and Sparse Matrix Computation, pages 1–29, New York, NY, 1993. Springer New York.
- [8] Caroline Brosse, Oscar Defrain, Kazuhiro Kurita, Vincent Limouzy, Takeaki Uno, and Kunihiro Wasa. On the hardness of inclusion-wise minimal separators enumeration. Inf. Process. Lett., 185:106469, March 2024. doi:10.1016/j.ipl.2023.106469.
- [9] Caroline Brosse, Aurélie Lagoutte, Vincent Limouzy, Arnaud Mary, and Lucas Pastor. Efficient enumeration of maximal split subgraphs and induced sub-cographs and related classes. Discrete Applied Mathematics, 345:34–51, 2024. doi:10.1016/J.DAM.2023.10.025.
- [10] Caroline Brosse, Vincent Limouzy, and Arnaud Mary. Polynomial Delay Algorithm for Minimal Chordal Completions. In Mikołaj Bojańczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), volume 229 of Leibniz International Proceedings in Informatics (LIPIcs), pages 33:1–33:16, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2022.33.
- [11] Filippo Brunelli, Alessio Conte, Roberto Grossi, and Andrea Marino. Output-sensitive enumeration of maximal cliques in temporal graphs. Discrete Applied Mathematics, 369:66–77, 2025. doi:10.1016/j.dam.2025.02.025.
- [12] Yixin Cao. Enumerating maximal induced subgraphs, 2020. arXiv:2004.09885.
- [13] Florent Capelli and Yann Strozecki. https://florent.capelli.me/coussinet/. Accessed: 2025-02-20.
- [14] Florent Capelli and Yann Strozecki. Geometric Amortization of Enumeration Algorithms. In Petra Berenbrink, Patricia Bouyer, Anuj Dawar, and Mamadou Moustapha Kanté, editors, 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023), volume 254 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1–18:22, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2023.18.
- [15] Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Generating all maximal induced subgraphs for hereditary and connected-hereditary graph properties. J. Comput. Syst. Sci., 74(7):1147–1159, 2008. doi:10.1016/J.JCSS.2008.04.003.
- [16] Alessio Conte, Roberto Grossi, Mamadou Moustapha Kanté, Andrea Marino, Takeaki Uno, and Kunihiro Wasa. Listing induced steiner subgraphs as a compact way to discover steiner trees in graphs. In Peter Rossmanith, Pinar Heggernes, and Joost-Pieter Katoen, editors, 44th International Symposium on Mathematical Foundations of Computer Science, MFCS 2019, August 26-30, 2019, Aachen, Germany, volume 138 of LIPIcs, pages 73:1–73:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.MFCS.2019.73.
- [17] Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, and Giulia Punzi. Beyond the best theorem: Fast assessment of eulerian trails. In Fundamentals of Computation Theory: 23rd International Symposium, FCT 2021, Athens, Greece, September 12–15, 2021, Proceedings 23, pages 162–175. Springer, 2021. doi:10.1007/978-3-030-86593-1_11.
- [18] Alessio Conte, Roberto Grossi, Andrea Marino, and Romeo Rizzi. Enumerating cyclic orientations of a graph. In Zsuzsanna Lipták and William F. Smyth, editors, Combinatorial Algorithms - 26th International Workshop, IWOCA 2015, Verona, Italy, October 5-7, 2015, Revised Selected Papers, volume 9538 of Lecture Notes in Computer Science, pages 88–99. Springer, 2015. doi:10.1007/978-3-319-29516-9_8.
- [19] Alessio Conte, Roberto Grossi, Andrea Marino, Takeaki Uno, and Luca Versari. Proximity search for maximal subgraph enumeration. SIAM Journal on Computing, 51(5):1580–1625, 2022. doi:10.1137/20M1375048.
- [20] Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Listing maximal subgraphs satisfying strongly accessible properties. SIAM J. Discrete Math., 33(2):587–613, 2019. doi:10.1137/17M1152206.
- [21] Alessio Conte, Mamadou Moustapha Kanté, Yota Otachi, Takeaki Uno, and Kunihiro Wasa. Efficient enumeration of maximal k-degenerate induced subgraphs of a chordal graph. Theoretical Computer Science, 818:2–11, 2020. doi:10.1016/J.TCS.2018.08.009.
- [22] Alessio Conte and Etsuji Tomita. On the overall and delay complexity of the CLIQUES and bron-kerbosch algorithms. Theor. Comput. Sci., 899:1–24, 2022. doi:10.1016/J.TCS.2021.11.005.
- [23] Alessio Conte and Takeaki Uno. New polynomial delay bounds for maximal subgraph enumeration by proximity search. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 1179–1190, New York, NY, USA, 2019. ACM. doi:10.1145/3313276.3316402.
- [24] Nadia Creignou, Raïda Ktari, Arne Meier, Julian-Steffen Müller, Frédéric Olive, and Heribert Vollmer. Parameterized enumeration for modification problems. In Adrian-Horia Dediu, Enrico Formenti, Carlos Martín-Vide, and Bianca Truthe, editors, Language and Automata Theory and Applications, pages 524–536, Cham, 2015. Springer International Publishing. doi:10.1007/978-3-319-15579-1_41.
- [25] Nadia Creignou, Raïda Ktari, Arne Meier, Julian-Steffen Müller, Frédéric Olive, and Heribert Vollmer. Parameterised enumeration for modification problems. Algorithms, 12(9):189, 2019. doi:10.3390/A12090189.
- [26] Nadia Creignou, Arne Meier, Julian-Steffen Müller, Johannes Schmidt, and Heribert Vollmer. Paradigms for Parameterized Enumeration. Theory Comput. Syst., 60(4):737–758, May 2017. doi:10.1007/s00224-016-9702-4.
- [27] David Eppstein. Finding the k shortest paths. SIAM J. Comput., 28(2):652–673, 1998. doi:10.1137/S0097539795290477.
- [28] David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in sparse graphs in near-optimal time. In Algorithms and Computation: 21st International Symposium, ISAAC 2010, Jeju Island, Korea, December 15-17, 2010, Proceedings, Part I 21, pages 403–414. Springer, 2010. doi:10.1007/978-3-642-17517-6_36.
- [29] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’01, pages 102–113, New York, NY, USA, 2001. Association for Computing Machinery. doi:10.1145/375551.375567.
- [30] Henning Fernau, Petr Golovach, Marie-France Sagot, et al. Algorithmic enumeration: Output-sensitive, input-sensitive, parameterized, approximative (dagstuhl seminar 18421). Dagstuhl Reports, 8(10):63–86, 2019. doi:10.4230/DAGREP.8.10.63.
- [31] Fedor V. Fomin, Fabrizio Grandoni, and Dieter Kratsch. A measure & conquer approach for the analysis of exact algorithms. Journal of the ACM (JACM), 56(5):25, 2009.
- [32] Fedor V. Fomin and Dieter Kratsch. Exact Exponential Algorithms. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2010. doi:10.1007/978-3-642-16533-7.
- [33] Michael L. Fredman and Leonid Khachiyan. On the Complexity of Dualization of Monotone Disjunctive Normal Forms. J. Algorithm. Comput. Technol., 21(3):618–628, November 1996. doi:10.1006/jagm.1996.0062.
- [34] Leslie Ann Goldberg. Efficient algorithms for listing combinatorial structures, 1992. PhD dissertation thesis.
- [35] Konstantin Golenberg, Benny Kimelfeld, and Yehoshua Sagiv. Optimizing and parallelizing ranked enumeration. Proc. VLDB Endow., 4(11):1028–1039, August 2011. doi:10.14778/3402707.3402739.
- [36] Petr A Golovach, Christian Komusiewicz, Dieter Kratsch, and Van Bang Le. Refined notions of parameterized enumeration kernels with applications to matching cut enumeration. J. Comput. System Sci., 123:76–102, February 2022. doi:10.1016/j.jcss.2021.07.005.
- [37] David S. Johnson, Christos H. Papadimitriou, and Mihalis Yannakakis. On generating all maximal independent sets. Inf. Process. Lett., 27(3):119–123, 1988. doi:10.1016/0020-0190(88)90065-8.
- [38] Donald B. Johnson. Finding all the elementary circuits of a directed graph. SIAM J. Comput., 4(1):77–84, 1975. doi:10.1137/0204007.
- [39] Mamadou Moustapha Kanté, Vincent Limouzy, Arnaud Mary, and Lhouari Nourine. On the Enumeration of Minimal Dominating Sets and Related Notions. SIAM J. Discrete Math., 28(4):1916–1929, January 2014. doi:10.1137/120862612.
- [40] Toshinobu Kashiwabara, Sumio Masuda, Kazuo Nakajima, and Toshio Fujisawa. Generation of maximum independent sets of a bipartite graph and maximum cliques of a circular-arc graph. J. Algorithms, 13(1):161–174, 1992. doi:10.1016/0196-6774(92)90012-2.
- [41] Leonid Khachiyan, Endre Boros, Khaled Elbassioni, and Vladimir Gurvich. On Enumerating Minimal Dicuts and Strongly Connected Subgraphs. Algorithmica, 50(1):159, October 2007. doi:10.1007/s00453-007-9074-x.
- [42] Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’06, pages 173–182, New York, NY, USA, 2006. Association for Computing Machinery. doi:10.1145/1142351.1142377.
- [43] Donald E. Knuth. The art of computer programming, volume 4, pre-fascicle 2a: A draft of section 7.2. 1.1: Generating all n-tuples, 2001.
- [44] Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa. Efficient constant-factor approximate enumeration of minimal subsets for monotone properties with weight constraints. Discret. Appl. Math., 361:258–275, 2025. doi:10.1016/J.DAM.2024.10.014.
- [45] Yasuaki Kobayashi, Kazuhiro Kurita, and Kunihiro Wasa. Polynomial-delay enumeration of large maximal common independent sets in two matroids and beyond. Information and Computation, 304:105282, 2025. doi:10.1016/j.ic.2025.105282.
- [46] Kazuhiro Kurita, Kunihiro Wasa, Hiroki Arimura, and Takeaki Uno. Efficient enumeration of dominating sets for sparse graphs. Discret. Appl. Math., 303:283–295, 2021. doi:10.1016/J.DAM.2021.06.004.
- [47] Kazuhiro Kurita, Kunihiro Wasa, Takeaki Uno, and Hiroki Arimura. A constant amortized time enumeration algorithm for independent sets in graphs with bounded clique number. Theor. Comput. Sci., 874:32–41, 2021. doi:10.1016/J.TCS.2021.05.008.
- [48] Eugene L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401–405, 1972. URL: http://www.jstor.org/stable/2629357.
- [49] Eugene L. Lawler, Jan Karel Lenstra, and Alexander H.G. Rinnooy Kan. Generating all maximal independent sets: NP-hardness and polynomial-time algorithms. SIAM Journal on Computing, 9(3):558–565, 1980. doi:10.1137/0209042.
- [50] Kazuhisa Makino and Takeaki Uno. New algorithms for enumerating all maximal cliques. In Proceedings of the 9th Scandinavian Workshop on Algorithm Algorithm Theory (SWAT 2004), volume 3111 of Lecture Notes in Computer Science, pages 260–272, Humlebæk, Denmark, July 2004. Springer Berlin Heidelberg. doi:10.1007/978-3-540-27810-8_23.
- [51] Arne Meier. Parametrised enumeration. Habilitation thesis, Leibniz Universität Hannover, 2020. doi:10.15488/9427.
- [52] John W. Moon and Leo Moser. On cliques in graphs. Israel journal of Mathematics, 3:23–28, 1965.
- [53] M. C. Paull and S. H. Unger. Minimizing the number of states in incompletely specified sequential switching functions. IRE Transactions on Electronic Computers, EC-8(3):356–367, 1959. doi:10.1109/TEC.1959.5222697.
- [54] Giulia Punzi, Alessio Conte, Roberto Grossi, and Andrea Marino. An efficient algorithm for assessing the number of st-paths in large graphs. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pages 289–297. SIAM, 2023. doi:10.1137/1.9781611977653.CH33.
- [55] Ronald C. Read and Robert E. Tarjan. Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks, 5(3):237–252, 1975. doi:10.1002/NET.1975.5.3.237.
- [56] Marco Rospocher. On the computational complexity of enumerating certificates of np problems, 2006. PhD dissertation thesis.
- [57] Frank Ruskey. Combinatorial generation. Working version (1j-CSC 425/520), 2003.
- [58] Johannes Schmidt. Enumeration: Algorithms and complexity. Preprint, available at https://www.thi.uni-hannover.de/fileadmin/forschung/arbeiten/schmidt-da.pdf, 2009.
- [59] Akiyoshi Shioura, Akihisa Tamura, and Takeaki Uno. An optimal algorithm for scanning all spanning trees of undirected graphs. SIAM J. Comput., 26(3):678–692, 1997. doi:10.1137/S0097539794270881.
- [60] Yann Strozecki. Enumeration complexity. Bull. EATCS, 129, 2019. URL: http://bulletin.eatcs.org/index.php/beatcs/article/view/596/605.
- [61] Yann Strozecki. Enumeration complexity: Incremental time, delay and space. CoRR, abs/2309.17042, 2023. doi:10.48550/arXiv.2309.17042.
- [62] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., 2005.
- [63] Robert E. Tarjan. Enumeration of the elementary circuits of a directed graph. SIAM J. Comput., 2(3):211–216, 1973. doi:10.1137/0202017.
- [64] Robert E. Tarjan. A note on finding the bridges of a graph. Information Processing Letters, 2(6):160–161, 1974. doi:10.1016/0020-0190(74)90003-9.
- [65] James C. Tiernan. An efficient search algorithm to find the elementary circuits of a graph. Communications ACM, 13:722–726, 1970. doi:10.1145/362814.362819.
- [66] Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science, 363(1):28–42, 2006. doi:10.1016/J.TCS.2006.06.015.
- [67] Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505–517, 1977. doi:10.1137/0206036.
- [68] Takeaki Uno. A fast algorithm for enumerating bipartite perfect matchings. In Algorithms and Computation, 12th International Symposium, ISAAC, pages 367–379, 2001. doi:10.1007/3-540-45678-3_32.
- [69] Takeaki Uno. Two general methods to reduce delay and change of enumeration algorithms. National Institute of Informatics (in Japan) Technical Report E, 4:2003, 2003.
- [70] Takeaki Uno. Constant time enumeration by amortization. In Frank Dehne, Jörg-Rüdiger Sack, and Ulrike Stege, editors, Algorithms and Data Structures - 14th International Symposium, WADS 2015, Victoria, BC, Canada, August 5-7, 2015. Proceedings, volume 9214 of Lecture Notes in Computer Science, pages 593–605. Springer, 2015. doi:10.1007/978-3-319-21840-3_49.
- [71] Leslie G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189–201, 1979. doi:10.1016/0304-3975(79)90044-6.
- [72] Leslie G. Valiant. The complexity of enumeration and reliability problems. SICOMP, 8(3):410–421, 1979. doi:10.1137/0208032.
- [73] Luca Versari. A new algorithmic framework for enumerating commutable set properties, 2017. Master Thesis, University of Pisa.
