BFS-Based Canonical Codes for Generating Graphs with Constraint Programming
Abstract
We consider the problem of generating all graphs that satisfy some given additional constraints (on vertex degrees, or cycle lengths, for example). Most previous works have proposed to generate canonical codes associated with adjacency matrices. In this paper, we consider canonical codes based on Breadth First Search (BFS), and we show how to generate them with Constraint Programming (CP): we introduce a set of basic constraints that must be satisfied by all canonical codes, thus breaking many symmetries, and we introduce a global constraint to break other symmetries. We illustrate the interest of our approach on connected claw-free cubic graphs, and show that it outperforms state-of-the-art CP and SAT Modulo Theory (SMT) approaches.
Keywords and phrases:
Graph Generation, Automorphisms, Symmetry BreakingCopyright and License:
2012 ACM Subject Classification:
Mathematics of computing Graph enumeration ; Computing methodologies Artificial intelligenceEditors:
Maria Garcia de la BandaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
In combinatorics and graph theory, graph enumeration involves systematically searching for all graphs that satisfy some given properties (such as a specific number of edges, vertex degree constraints, or structural features like girth). This has numerous applications in fields like chemical informatics and drug discovery, where molecules are modeled as graphs [5, 19]. To prevent us from generating isomorphic graphs, i.e., graphs that are equivalent up to a renaming of their vertices, we may use canonical codes, i.e., words associated with graphs such that two graphs are isomorphic if and only if they have the same canonical code: instead of enumerating graphs, we enumerate canonical codes, thus ensuring that all generated graphs are non-isomorphic.
Canonical codes may be based on adjacency matrices, i.e., 2-dimensional arrays such that if is an edge, and 0 otherwise. A code may be obtained from by concatenating its rows (or columns), thus obtaining a binary word. Since different vertex permutations may produce different adjacency matrices and, therefore, different codes, the lexicographically smallest one is selected as the canonical code, thus uniquely identifying the graph. To avoid exhaustively considering all vertex permutations, symmetry-breaking constraints may be used. In [4], Codish et al. introduce symmetry-breaking constraints that enforce a non-descending row order in the adjacency matrix and ensure minimality under vertex permutations within specific partitioned sets.
The Nauty algorithm [13] generates a canonical labeling through an iterative partition refinement process, where vertices are colored based on their connectivity. Once a discrete partition is reached, it serves as the canonical label. Although there is no direct work that enumerates all canonical codes using Nauty’s labeling as constraints, as far as we know, the concept of symmetry breaking via structural graph information has been applied by Codish et al. [3]. They encode partition refinement into constraints, and introduce additional constraints to maintain minimality in the adjacency matrix while preserving partition structure. This approach provides insights into constructing compact and complete symmetry-breaking constraints, effectively reducing the need to evaluate all vertex permutations. Further studies on complete symmetry breaking are explored in [8, 9].
In addition to the adjacency matrix, Codish et al. demonstrate in [10] how various higher-dimensional graph invariants, such as vertex degrees and the cardinality of common neighbors, can be leveraged to define symmetry-breaking constraints. They compare different combinations of these invariants to evaluate their effectiveness in reducing the search space.
Beyond static symmetry breaking, which relies on a predefined set of permutations, Kirchweger and Szeider introduce in [11] a novel SMT-based approach for graph generation using dynamic symmetry breaking. The idea is to detect symmetries in the partially generated graph to enforce adjacency matrix minimality during the solving process, thus significantly improving efficiency compared to static symmetry-breaking methods.
Another class of canonical codes relies on graph traversals, where codes are sequences of traversed edges. Graph mining algorithms, such as Gaston [15] and gSpan [20], are usually based on Depth-First-Search (DFS). In [18] and [16], canonical codes based on Breadth-First-Search (BFS) are used for particular classes of graphs (Deterministic Finite Automata in [18] and hexagonal graphs in [16]) for which graph isomorphism may be solved in polynomial-time. In [14], some BFS-based symmetry-breaking predicates are introduced for generating connected graphs.
In this paper, we extend the work of [14] and introduce new symmetry breaking constraints for BFS-based canonical codes. Indeed, BFSs naturally construct spanning trees, and the orbits formed by the vertices within a spanning tree can be exploited to derive compact symmetry breaking constraints, that are efficiently handled by CP solvers. Also, every prefix of our canonical code is a canonical code, thus enabling the construction of dynamic symmetry-breaking constraints, similar to the approach in [11].
The paper is organized as follows. In Section 2, we introduce notations. In Section 3, we introduce BFS-based codes, and define a canonical BFS-based code as the smallest possible code. In Section 4, we describe a CP model for generating BFS-based codes. In Sections 5 and 6, we study properties of canonical codes. In Section 7, we introduce a global constraint that exploits these properties to ensure canonicity. In Section 8, we evaluate our method for generating connected claw-free cubic graphs, and compare it with the approach of [10], which relies on an adjacency matrix representation, as well as with the SMT-based approach of [11] and with Nauty [13]. We also introduce a more compact graph representation for this benchmark, thus enabling us to solve larger instances.
2 Notations and definitions
We note the set of all integers ranging from to , the cardinality of a set , and the lexicographic order for comparing sequences of integer values.
Two graphs and are isomorphic if there exists a bijection that preserves edges, i.e., . This problem belongs to , but it is not known to be in nor to be -complete, and it is conjectured to be -intermediate.
Throughout this paper, we consider a connected graph such that and , so that . We note the degree of a vertex .
3 Canonical BFS-based code
A code is a sequence of integer values, generated through a BFS of as described in Algorithm 1.
We use a FIFO queue to store the vertices that have been discovered but not yet treated, and we build two sequences and of and integer values, respectively. When a vertex is pushed in , we store in its -number, which ranges from 0 for the first pushed vertex to for the last pushed one. At each iteration of the while loop (Lines 4-13), a vertex is popped from , and we consider each vertex adjacent to .
-
If is reached for the first time (Lines 7-8), is said to be a forward edge. is the parent of , and it is added at the end of . At the end of Algorithm 1, there are forward edges which form a spanning tree of .
-
If has already been reached (Lines 9-10), is said to be a Backward edge, and and are added at the end of . To avoid treating twice a same edge, we add the condition that must be smaller than .
The final code is obtained by concatenating and (Line 14). Hence, a code is a sequence of integer values such that for each , is a forward edge and for each , is a backward edge. The first part of a code, from to is called forward code, whereas the second part of the code, from to is called backward code. When there is no ambiguity, commas between vertices in codes may be removed.
Different codes may be built for a same graph, depending on (i) which vertex is chosen first (Line 2) and (ii) the order considered to visit successors of (Line 6). In order to make the algorithm deterministic, we add a parameter which is a permutation of used to break ties: Line 2, the chosen vertex is the first vertex of ; Line 6, successors of are visited in the order defined by . The code obtained when using to break ties is denoted BFS.
Example 1.
Let us consider the graph displayed in Fig. 1 when the permutation is , as displayed in Fig. 1(i). The first vertex pushed in is , and we have . When is popped from , we iterate on its adjacent vertices in the order defined by :
-
when , we set to and add to , i.e., is a forward edge;
-
when , we set to and add to , i.e., is a forward edge;
-
when , we set to and add to , i.e., is a forward edge.
When is popped from , we iterate on its adjacent vertices in the order defined by :
-
when , we do nothing because edge has already been treated;
-
when , we add and in , i.e., is a backward edge;
-
when , we set to and add to , i.e., is a forward edge;
-
when , we add and in , i.e., is a backward edge.
When is popped from , we iterate on its adjacent vertices in the order defined by :
-
when or , we do nothing because edges and have already been treated;
-
when , we add and in , i.e., is a backward edge.
When is popped from , we iterate on its adjacent vertices in the order defined by :
-
when or , we do nothing because edges and have already been treated;
-
when , we add and in , i.e., is a backward edge.
When is popped from , we iterate on its adjacent vertices in the order defined by , i.e., , , and , but we do nothing as edges , , and have already been treated. Hence BFS.
Given a code, we can build the corresponding graph as the code allows us to reconstitute all its edges. For example, given BFS, we reconstitute the set of edges .
| (i) | (ii) | (iii) | |
|---|---|---|---|
As shown in Example 1, there exist different possible codes for , depending on the considered permutation . We define a total order on the set of all possible codes that may be associated with by considering a lexicographic order. Among all the possible codes for , the smallest one according to this order is called the canonical code of and it is unique.
Definition 2 (Canonical code ).
The canonical code of a graph is defined as when considering the lexicographic order to compare codes.
In the example of Figure 1, the canonical code is .
An important property to allow an efficient enumeration of canonical codes is that any prefix of a canonical code is a canonical code: this property allows us to stop completing a code whenever its prefix is not canonical.
Theorem 3.
Let be a canonical code.
-
For each , the prefix is a canonical code.
-
For each , the prefix is a canonical code.
Proof.
Let be the subgraph of that contains all edges defined by . Assume, for contradiction, that is not canonical, i.e., there exists another lexicographically smaller code that represents the same graph . We consider two cases.
Recall that our canonical code, derived from a BFS traversal, first lists the forward edges followed by the backward edges. In the first case, consists only of forward edges, implying that forms a tree. Since is a valid traversal of , it can be extended into a full traversal of by discovering the remaining vertices, and results in a code smaller than , which violates the canonicity of . In the second case, includes backward edges. If the forward edge code is already canonical, then the backward edges must follow a unique order, as dictated by lines 6-12 of Algorithm 1. A smaller would be extended to a full code smaller than by enumerating the remaining backward edges.
Thus, in both cases, we reach a contradiction, proving that every prefix of a canonical code is also canonical.
4 Basic CP model
We propose to generate codes with CP, thus allowing one to easily add other application-dependent constraints. Our CP model has the following integer variables:
-
for each , corresponds to the parent of vertex in the spanning tree (in other words, is a forward edge), and its domain is because the parent of must have been discovered before ;
-
for each , and correspond to the two endpoints of the th backward edge and their domain is because an edge incident to cannot be backward;
-
for each , corresponds to the degree of vertex , and its domain is .
- C1:
-
- C2:
-
- C3:
-
- C4:
-
- C5:
-
- C6:
-
- C7:
-
- C8:
-
- C9:
-
- C10:
-
The constraints are listed in Fig. 2. Constraints C1 to C7 are satisfied by any BFS-based code, even if it is not canonical:
-
C1 is a consequence of the fact that the vertex whose -number is 1 has been pushed in just after 0 and, therefore, its parent can only be 0.
-
C2 and C4 are consequences of the fact that, at each iteration of the while loop (Lines 4-13), the vertex popped from has a -number increased by one (as vertices are pushed in by increasing -number).
-
C3 is a consequence of the condition Line 9.
-
C5 expresses the fact that, for each backward edge , the parent of has been discovered before . Indeed, let us suppose that this is not the case, i.e., . In this case, the parent of would necessarily be because would not yet have been discovered when exploring the vertices adjacent to (Lines 7-8).
-
C6 and C7 relate degrees with edge variables.
Constraints C8 to C10 prevent us from generating some non-canonical codes, i.e., codes that are not the smallest possible ones:
-
C8 comes from the fact that starts with occurrences of : if there exists a vertex such that , then a smaller code is obtained by starting BFS from the vertex.
-
If C9 is not satisfied, then a smaller code is obtained by exchanging and .
-
if C10 is not satisfied, then a smaller code is obtained by visiting before .
We have experimentally compared this first CP model with the model introduced in [14] on a toy problem that aims at enumerating all graphs with vertices and edges. Both models have been implemented in Choco, and compute the same sets of solutions. However, our model is more efficient: when (resp. 6, 7, and 8), it needs 0.02 (resp. 0.1, 0.4, and 4.3) seconds whereas the model of [14] needs 0.04 (resp. 0.2, 1.4, and 28.4) seconds. This comes from the fact that (i) we order sibling vertices with respect to the number of children (thanks to constraint C10) instead of sub-tree weights, and (ii) we explicitly model backward edges with and variables instead of using an adjacency boolean matrix.
Constraints C1 to C10 allow us to enumerate all canonical BFS-based codes, but some non-canonical codes may also be enumerated. Hence, in the next two sections we introduce two additional properties that must be satisfied by canonical codes, and that are used to propagate a global constraint that ensures the canonicity of BFS-based codes, as described in Section 7.
5 Breaking Symmetries of Forward Codes with Vertex Labels
In this section, we introduce a property which is used in Section 7 to ensure the canonicity of prefixes of forward codes. Forward code prefixes correspond to trees, and the tree isomorphism problem may be solved in polynomial time by computing vertex labels [1]. We exploit similar labels but, unlike the original algorithm of [1], we do not rename labels with integer values at each level of the tree, but define the label of a vertex as a sequence that contains an integer value for each vertex in the subtree rooted in . This allows us to compute labels in a more incremental way during the search.
Definition 4 (Vertex label).
Let with be the prefix of a forward code. Let be the tree associated with this prefix. For every vertex , let be the subtree of rooted at , and be the sequence of all vertices in ordered by increasing value. The label of , denoted , is the sequence obtained by replacing every vertex in by where is the number of children of in the tree defined by the forward code .
For example, let us consider the tree displayed on the left of Fig. 3. The forward code associated with this tree is . When , we have and because and . When , we have and because , , , and .
Given the label of the root of a tree, we can reconstitute this tree as each value at position in gives the number of children of vertex . Hence, there is a bijection between each forward code and the label of the root of its associated tree. For example, we display in Fig. 3 two isomorphic trees and . corresponds to the forward code and the label of its root is , whereas corresponds to the forward code and the label of its root is .
An interesting property of vertex labels is that the children of a vertex in the tree associated with a canonical forward code prefix have non-decreasing labels, as stated below.
Property 1.
Let with be the prefix of a canonical forward code. We have: .
Proof.
Let and be the subtrees rooted at and , respectively. The level of a vertex in (resp. ) is its distance to the root (resp. ). Let be the ordered sequence of vertices in , and be the ordered sequence of vertices in . Since , and are sibling. Therefore, vertices at level in are discovered earlier than those at level in . Suppose by contradiction, i.e., . This can occur either when (i) is a prefix of or (ii) it exists such that . In both cases, if we swap the order of and , then all vertices in will be explored before those in at the same level, resulting in a larger label , which contradicts the canonicity of .
6 Breaking Symmetries of Backward Edges
Given a forward code prefix, we propose to exploit automorphisms in the tree associated with this prefix in order to break symmetries on backward edges. An automorphism of a tree is a permutation of which is an isomorphism from to . Given a forward code prefix , we only consider automorphisms that are still automorphisms on all possible extensions of to full canonical forward codes, as defined below.
Definition 5 ().
Let with be a prefix of a canonical forward code. is the set of all automorphisms that are still valid on all canonical extensions of , i.e., for any sequence such that is canonical, the automorphism such that and is an automorphism of the tree associated with , i.e., .
For example, let us consider the tree displayed in Fig. 3. The canonical code of is . Let us assume that , i.e., we still have to add three forward edges to the tree. Let be the automorphism that only exchanges vertices 4 and 5 and leaves unchanged all other vertices, i.e., , and . does not belong to . Indeed, there exist extensions of for which is not an automorphism. For example, if we add the forward edges and to , leading to the forward code , then is no longer an automorphism because the subtree rooted in 4 is no longer isomorphic to the subtree rooted in 5.
Now, let us consider the tree obtained from by adding the forward edge . The associated forward code is . In this case, belongs to because it is still an automorphism on all canonical extensions of . Indeed, 4 and 5 cannot be parent (because 6, which is at the same level as 4 and 5 but with a larger q-number, already has a child).
The following property allows us to exploit automorphisms to break symmetries on backward edges.
Property 2.
Let be a canonical code of .
The following property holds for any and any such that :
where is obtained from in two steps: (i) we compute the set of edges , and (ii) we sort the set of edges in lexicographically ascending order and concatenate them into a sequence of values.
Proof.
By definition, when applying to the tree associated with , we obtain a tree which is isomorphic to and, therefore, has the same canonical code as . For contradiction, let us assume that there exists such that . In this case, the code obtained from by replacing with is lexicographically smaller than , which is in contradiction with the fact that is canonical.
For example, let us consider the graph displayed in the left of Fig. 4. The permutation that exchanges 4 with 3 and leaves unchanged all other vertices belongs to . The code of is . In Step 1, we compute the set of edges obtained when applying the permutation to , i.e., . Then, we sort edges of in lexicographic order to obtain the sequence which is lexicographically smaller than . Hence, is not canonical.
7 Global CanonicalCode Constraint
Properties 1 and 2 are used to propagate the global constraint defined below.
Definition 6.
Let and be integer values, and , and be integer variables. The global constraint is satisfied iff is a BFS-based canonical code.
Propagation of Property 1
We exploit Property 1 to ensure that the forward code is canonical: When the domain of a variable with is reduced to a singleton, if the domain of is also reduced to a singleton for each , and if there exists such that and , then a failure is raised. Note that this ensures that the forward code is the smallest possible one when starting the search from vertex 0. However, if there exists a vertex such that , then it may be possible that a smaller forward code exists. Hence, to fully ensure that is canonical, we must build every tree starting from a vertex such that , build the associated smallest forward code, and check that it is not smaller than .
For an efficient propagation of Property 1, we maintain a 2 dimensional array such that, depending on whether , , or .
Propagation of Property 2
We exploit Property 2 to detect some cases where the backward code is not canonical: When the domain of a variable is reduced to a singleton, if the domains of and are also reduced to singletons for each , a failure is raised if there exists such that where is the largest value such that the domain of is reduced to a singleton for each . We use vertex labels to compute a partition of the vertices in orbits (vertices and are in a same orbit if and ), and we use this partition to compute automorphisms. However, to compute when , we need to discard automorphisms that may not be valid after the addition of new forward edges (as stated in Def. 5). More precisely, if two vertices and belong to a same orbit in , and the largest vertex in their respective subtrees and is smaller than , then this orbit is considered to compute because it is preserved in , as and cannot be further extended with forward edges.
Canonicity Check
Properties 1 and 2 are necessary conditions for canonicity, but they are not sufficient. For example, let us consider the tree displayed in Fig. 3, and let us assume we have added edges and to . In this case, the code is not canonical, though its forward code is canonical and it satisfies Properties 1 and 2. Indeed, the canonical code is 000112,23,34,46, and it is obtained by starting the BFS from vertex 4 of .
Hence, we need to check if there exists another BFS that leads to a smaller code (in which case we raise a failure). More precisely, let be the largest value such that the domain of is reduced to a singleton for each , let be the largest value such that the domains of both and are reduced to singletons for each , and let be the corresponding graph, i.e., and . If there exists a permutation of such that BFS, then we raise a failure. We exploit Constraint C8 to limit the set of permutations to those that start with a vertex that has the same degree as 0 in . We also exploit Constraints C9 and C10 as well as Properties 1 and 2 to break ties when choosing the next vertex to visit (Line 6 of Algo 1). Finally, we exploit the property LexBFS introduced in [6] to avoid some BFSs (that cannot lead to canonical codes) by breaking ties when choosing the next neighbor of to visit (Line 6). More precisely, for each neighbor of , let be the set of q-numbers of neighbors of that have already been numbered, and let be the sequence obtained by sorting elements of by increasing value. At each iteration of the loop Lines 6-11, we choose the vertex which has the smallest sequence , where a sequence is smaller than another sequence if is a prefix of or if (see [6] for more details).
First Experiments on Toy Problems
We have implemented our global constraint and our CP model in Choco [17] in a straightforward way111Our code is available at https://github.com/godotshaw/bfscanonicalcode.git, using a global count constraint to implement Constraints C6, C7, and C10.
To evaluate the interest of our global constraint, we consider three toy problems, denoted , and . aims at enumerating all graphs with vertices and edges. We denote this set of graphs. aims at enumerating all graphs of which have one vertex of degree and vertices of degree three. aims at enumerating all graphs of which have exactly two vertices of degree while all other vertices have degrees strictly lower than . Constraints on vertex degrees are defined in a very straightforward way as our CP model already has variables associated with degrees.
| - | -+ | - | - + | - | - + | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 15 | 0.02 | 2 | 0.01 | 3 | 0.00 | 1 | 0.00 | 0 | 0.00 | 0 | 0.00 |
| 7 | 8,959 | 0.29 | 126 | 0.23 | 70 | 0.00 | 2 | 0.00 | 0 | 0.01 | 0 | 0.01 |
| 9 | 9,736,406 | 177.2 | 26,631 | 9.77 | 3,507 | 0.11 | 3 | 0.01 | 756,497 | 20.83 | 5,804 | 1.88 |
| 11 | >9,740,145 | - | >318,456 | - | 286,884 | 9.17 | 5 | 0.07 | >8,593,667 | - | >835,095 | - |
In Table 1, we give experimental results obtained on an Intel Xeon E5-2623v3 of 3.0GHz with 32GB of RAM. For the three problems, the number of solutions when considering only Constraints C1 to C10 is much larger than the actual number of different graphs (given by column of -+). For example, when , only contains 2 different graphs but we generate 15 different codes. The addition of the global constraint allows us to compute only canonical codes, and this strongly reduces both the number of generated solutions and the CPU time. When adding constraints on vertex degrees, the number of canonical codes and, therefore, the CPU time are strongly decreased, especially for .
8 Application to the Generation of Connected Claw-Free Cubic Graphs
Cubic graphs are graphs in which each vertex has degree 3. A cubic graph is claw-free if it does not contain as an induced subgraph. This is equivalent to requiring that every vertex must participate in at least one triangle.
To enumerate all claw-free cubic graphs, we define a CP model composed of Constraint C1 to C10 combined with our global cc constraint. To enforce the cubic degree condition, we set the domain of each degree variable to , for each .
To ensure claw-freeness, we introduce a global constraint that ensures that each vertex is involved in at least one triangle. The propagator of this constraint maintains two sets of edges as proposed in [2]: mandatory edges, which must be in the solution, and possible edges, which may be included. We use sparse sets to efficiently maintain these sets [12].
A dynamic variable selection strategy is employed to ensure that is maintained throughout the search, where and are the depth levels of the last instantiated forward and backward edge variables, respectively. By prioritizing the instantiation of -variables while interleaving backward edge variables, we avoid the enumeration of useless spanning trees. Additionally, for every , is instantiated just after .
As the procedure for checking the canonicity is rather expensive, we introduce a parameter which allows us to control the frequency of canonicity checking: When , canonicity is checked after each edge assignment; when , it is checked every edge assignment. Of course, when all variables are assigned, the canonicity check is performed, whatever the value of is, in order to ensure that the global constraint is satisfied. A similar parameter is used in the SMT-based approach of [11].
Table 2 shows the results for by steps of 2, when our parameter belongs to . When , run times are very often longer than when . A good tradeoff is reached when .
In Table 2, we also display the results reported by Codish et al. in [10] (the code of this approach is not available). This approach does not break all symmetries and, therefore, it may compute redundant graphs that are isomorphic to previously enumerated graphs. For example, when , there are different graphs whereas the approach of [10] computes solutions. As a consequence, this approach does not scale well and cannot be used to solve larger instances within a reasonable amount of time. As a comparison, our method which only computes non-isomorphic graphs achieves a speed-up of more than 60 for (note however that the two approaches have been run on different computers).
We have adapted the SMT-based approach described in [11] to generate claw-free cubic graphs, and we display the results obtained with this approach, on the same computer as the one used in our experiments. To handle the claw-free constraint, we directly utilize the solver’s built-in feature, "–forbidden-induced-subgraphs", to prohibit the predefined induced subgraph . In [11], a parameter similar to is used to control the frequency of "minimality check", a non-polynomial operation that verifies whether the partially defined graph is canonical in their approach. We report results obtained with (the default value for this parameter is 20). The best results of SMT are obtained when , and our method consistently outperforms SMT across all instances.
Finally, we display results obtained with Nauty [13] (using geng -F to ensure claw-freeness). Our approach is always more efficient than Nauty, but it needs more memory (e.g., 444584Kb instead of 22760Kb when ).
| graphs | Approach of [10] | Our CP model -+ | SMT approach of [11] | Nauty | ||||||||
| sols | time | 1/1 | 1/10 | 1/20 | 1/30 | 1/1 | 1/10 | 1/20 | 1/30 | |||
| 20 | 15 | 132 | 2.1s | 0.5 | 0.4 | 0.5 | 0.5 | 1.6 | 0.8 | 1.5 | 1.3 | 0.5 |
| 22 | 27 | 307 | 3.9s | 1.0 | 0.7 | 0.8 | 0.9 | 3.3 | 1.9 | 2.5 | 3.5 | 2.3 |
| 24 | 54 | 660 | 11.0s | 1.2 | 1.2 | 1.3 | 1.6 | 9.4 | 3.6 | 4.2 | 5.3 | 10.5 |
| 26 | 94 | 1,835 | 45.4s | 2.2 | 1.6 | 2.0 | 2.5 | 19.9 | 5.3 | 6.5 | 7.7 | 48.6 |
| 28 | 181 | 4,372 | 2.57m | 3.7 | 2.5 | 2.7 | 4.2 | 33.5 | 10.6 | 15.1 | 18.5 | 218.2 |
| 30 | 369 | 10,567 | 6.60m | 6.4 | 4.9 | 4.3 | 8.0 | 63.2 | 20.8 | 21.5 | 29.0 | 3293.7 |
| 32 | 731 | 29,069 | 24.28m | 11.0 | 8.9 | 7.5 | 14.4 | 138.4 | 33.2 | 46.4 | 44.7 | - |
| 34 | 1,502 | - | - | 25.0 | 18.5 | 16.0 | 20.9 | 313.6 | 70.2 | 87.6 | 82.4 | - |
| 36 | 3,187 | - | - | 59.7 | 32.9 | 44.4 | 37.3 | 512.9 | 135.5 | 177.7 | 190.6 | - |
| 38 | 6,914 | - | - | 144.7 | 92.8 | 128.1 | 77.6 | 935.6 | 249.6 | 328.7 | 348.9 | - |
| 40 | 15,025 | - | - | 421.9 | 232.2 | 295.7 | 206.8 | 1936 | 548.8 | 734.0 | 775.0 | - |
| 42 | 33,687 | - | - | 1195 | 456.2 | 589.9 | 594.0 | 4872 | 1420 | 1526 | 1511 | - |
| 44 | 77,450 | - | - | 3094 | 1133 | 1142 | 1461 | - | 2979 | - | 3584 | - |
Graph contraction
To show the versatility of our approach, we show how to exploit a property introduced in [7] to speed-up the generation of claw-free cubic graphs.
Proposition 7 (Claim A in [7]).
| (a) | (b) |
Hence, a claw-free cubic graph can be contracted by replacing specific patterns with labeled meta-vertices. Let us first define the basic contraction operation.
Definition 8 (Contraction).
Given a graph , a pattern graph , and a set such that the subgraph of induced by is isomorphic to , the contraction of with respect to and is the multigraph such that the occurrence of in is replaced with a single meta-vertex , i.e., and . The meta-vertex is called a -meta-vertex.
is a multiset (and therefore is a multigraph) because there may exist a vertex such that several vertices of are adjacent to .
We consider a contracted graph obtained by contracting triangles and diamonds. In a cubic graph, two diamonds cannot share a same vertex. Also, two triangles that are not in a diamond cannot share a same vertex. To ensure a deterministic process such that the contracted graph is unique, we contract diamonds before triangles.
Definition 9 (Contracted graph ).
Given a claw-free cubic graph , the contracted graph is the graph obtained by (i) contracting every occurrence of the diamond pattern in into a -meta-vertex, and (ii) contracting every occurrence of the triangle pattern into an -meta-vertex.
We display in Fig. 6 an example of contracted graph .
Proposition 7 ensures us that only contains meta-vertices. As is a cubic graph and the triangle has 3 vertices of degree 2, the degree of every -meta-vertex is 3. Similarly, the degree of every -meta-vertex in is 2 because the diamond has 2 vertices of degree 3.
Proposition 10.
There are exactly 3 patterns that may create multi-edges in . These patterns, named , , and , are displayed in Fig. 7.
Proof.
A bridge in a connected graph is an edge such that the graph without this edge is no longer connected. There are only two possible claw-free cubic graphs that contain no bridge. These two graphs are displayed in Fig. 8 (first two graphs on the left) and they are treated as special cases. All other claw-free cubic graphs contain at least one bridge. Let be one of these graphs and be the graph obtained from by removing all bridges. is composed of connected components such that each connected component is a claw-free graph that does not contain bridges and that contains at least one vertex of degree 2. The multi-edges in are also multi-edges in . Each connected component of is necessarily one of the three graphs displayed in Fig. 7. Hence, to remove all multi-edges, we contract as defined below, starting with occurrences of because is a subgraph of .
| (i) | (ii) | (iii) |
Definition 11 (Contracted graph ).
Given a contracted graph , the contracted graph is obtained from by (i) contracting every occurrence of into an -meta-vertex, then (ii) contracting every occurrence of or into a -meta-vertex or a -meta-vertex.
We display in Fig. 6 an example of contracted graph . We can show that for any initial claw-free cubic graph , the contracted graph is either one of the 5 graphs displayed in Fig. 8, which are treated as special cases, or it is a simple connected graph that does not contain multi-edges and that contains 5 different kinds of meta-vertices: the degree of meta-vertices of type (resp. , and ) is 3 (resp. 2, 1, 1, and 2). The meta-vertices of define a partition of the vertices of : each meta-vertex of type (resp. , and ) corresponds to a set of 3 (resp. 4, 6, 7, and 9) vertices of .
Proposition 12.
Let and be two claw-free cubic graphs, and let and be their corresponding contracted graphs. Then and are isomorphic if and only if and are isomorphic.
Proof.
To prove the proposition, we establish that the contraction process from a claw-free cubic graph to its contracted graph is bijective. First, the contraction process following Definition 9 and Definition 11 is deterministic. Additionally, we label each vertex in according to the subgraph pattern from which it was contracted, ensuring that retains information about the structure of . Therefore, the contracted graph is uniquely determined by . Conversely, given , we can reconstruct uniquely by replacing each meta-vertex in with its corresponding subgraph pattern. Since each subgraph pattern is symmetric with respect to the vertices of degree 2, there is no ambiguity in reconnecting the subgraphs. The connections between these subgraphs follow directly from the adjacency structure of , ensuring a unique reconstruction of .
This uniqueness in both directions (from to and from to ) implies that and are isomorphically equivalent.
| (c) | (d) | (e) | |||
The enumeration of claw-free cubic graphs can now be simplified to the enumeration of contracted graphs (except when in which case we must add the special cases displayed in Fig. 8). These contracted graphs have labels associated with vertices. Hence, we introduce a new canonical code for labeled graphs.
Definition 13 (Canonical code for a labeled graph).
Let be a connected graph, be a finite set of labels, and be a vertex labeling function. A code is a sequence , where , is a BFS-based code of and is the label of the vertex whose -number is . The canonical code, denoted , is the lexicographically smallest among all possible codes generated by different BFS traversals of .
Under this definition, the canonical code for the contracted graph in Fig. 6 is represented as . Given a connected claw-free cubic graph of order , the size of the corresponding contracted graph is not fixed but instead depends on the types and frequencies of patterns present in . Hence, we introduce a new integer variable for each that corresponds to the number of -meta-vertices. We also introduce two new integer variables and that correspond to the number of variables and edges of the contracted graph . Finally, for each , we introduce a variable which represents the label of the th meta-vertex of . The following constraints must be satisfied:
-
-
-
-
Connectedness:
-
Occurrence of vertices with the same label:
-
Degree constraints:
,
In this model, the 5 special graphs shown in Fig 8 are not addressed. However, they are accounted for when counting the number of solutions for .
We add to these constraints the constraints C1 to C10 of Fig. 2, while replacing and with and . We also extend our global canonicity constraint by ensuring that the label sequence is the smallest possible one under all possible BFS traversals.
Table 3 presents the solving time for enumerating all contracted graphs corresponding to all claw-free cubic graphs for when the frequency parameter is set to 1 (greater values for do not improve results). Since the contracted graph is significantly smaller than the original claw-free cubic graph, we observe a notable reduction in solving times. For instance, when , generating 6914 solutions takes only 2.2 seconds, whereas enumerating the primary claw-free cubic graphs requires 32.9 seconds. With this approach, we can generate graphs up to within 9 hours.
| graphs | time | graphs | time | graphs | time | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 15 | 0.1s | 34 | 1,502 | 1.6s | 48 | 418,112 | 131.2s | ||
| 22 | 27 | 0.2s | 36 | 3,187 | 2.2s | 50 | 1,005,927 | 293.9s | ||
| 24 | 54 | 0.3s | 38 | 6,914 | 3.2s | 52 | 2,412,987 | 756.0s | ||
| 26 | 94 | 0.3s | 40 | 15,025 | 5.9s | 54 | 5,934,636 | 1,877.8s | ||
| 28 | 181 | 0.4s | 42 | 33,687 | 11.8s | 56 | 14,823,532 | 4,526.9s | ||
| 30 | 369 | 0.7s | 44 | 77,450 | 23.6s | 58 | 37,005,614 | 11,836.1s | ||
| 32 | 731 | 0.8s | 46 | 177,465 | 54.1s | 60 | 94,412,125 | 30,948.9s |
9 Conclusion
We introduce a new canonical graph encoding based on a BFS traversal, which is well-suited for constraint-based approaches. Using this encoding, we define a CP model that enumerates all non-isomorphic graphs satisfying specific constraints by generating their corresponding canonical codes. The key aspect of this encoding is identifying a spanning tree that yields the lexicographically smallest representation of the graph. We demonstrate how this encoding enables the formulation of fundamental static symmetry-breaking constraints. Additionally, we leverage the crucial property that every prefix of a canonical code must also be canonical, allowing us to define a global constraint that dynamically enforces symmetry-breaking, ensuring the canonicity of the generated codes.
We tested our approach on connected claw-free cubic graphs, which are regular and inherently challenging for BFS-based methods. However, our experimental results demonstrate that our approach outperforms existing state-of-the-art techniques based on adjacency matrix representations. Beyond this, we explored the properties of local structures within this class of graphs that allow us to contract graphs into more compact labeled graphs. Our BFS-based canonical code may be easily extended to labeled graphs, and this allows us to enumerate graphs of larger order, up to 60 vertices. As part of our future work, we aim to refine our encoding for such labeled graphs and develop dedicated propagation strategies.
Our approach is well-suited for enumerating graphs with a limited diameter, such as diameter-2-critical graphs and required-girth extremal graphs [11]: In these cases, the number of canonical spanning trees remains manageable, allowing them to be precomputed and used for static symmetry breaking. We plan to publish further results on these aspects in the near future, as they could not be included in this work due to space limitations.
References
- [1] Alfred V Aho and John E Hopcroft. The design and analysis of computer algorithms. Pearson Education India, 1974.
- [2] Nicolas Beldiceanu, Irit Katriel, and Xavier Lorca. Undirected forest constraints. In International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pages 29–43. Springer, 2006. doi:10.1007/11757375_5.
- [3] Michael Codish, Graeme Gange, Avraham Itzhakov, and Peter J Stuckey. Breaking symmetries in graphs: the nauty way. In Principles and Practice of Constraint Programming: 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings 22, pages 157–172. Springer, 2016. doi:10.1007/978-3-319-44953-1_11.
- [4] Michael Codish, Alice Miller, Patrick Prosser, and A Stuckey. Breaking symmetries in graph representation, 2013.
- [5] Laurianne David, Amol Thakkar, Rocío Mercado, and Ola Engkvist. Molecular representations in ai-driven drug discovery: a review and practical guide. Journal of cheminformatics, 12(1):56, 2020. doi:10.1186/S13321-020-00460-5.
- [6] Michel Habib and Christophe Paul. A simple linear time algorithm for cograph recognition. Discrete Applied Mathematics, 145(2):183–197, 2005. doi:10.1016/J.DAM.2004.01.011.
- [7] Michael A Henning and Christian Löwenstein. Locating-total domination in claw-free cubic graphs. Discrete Mathematics, 312(21):3107–3116, 2012. doi:10.1016/J.DISC.2012.06.024.
- [8] Marijn JH Heule. Optimal symmetry breaking for graph problems. Mathematics in Computer Science, 13:533–548, 2019. doi:10.1007/S11786-019-00397-5.
- [9] Avraham Itzhakov and Michael Codish. Breaking symmetries in graph search with canonizing sets. Constraints, 21:357–374, 2016. doi:10.1007/S10601-016-9244-Z.
- [10] Avraham Itzhakov and Michael Codish. Breaking symmetries with high dimensional graph invariants and their combination. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 133–149. Springer, 2023. doi:10.1007/978-3-031-33271-5_10.
- [11] Markus Kirchweger and Stefan Szeider. Sat modulo symmetries for graph generation and enumeration. ACM Transactions on Computational Logic, 2024.
- [12] Vianney Le Clément de Saint-Marcq, Pierre Schaus, Christine Solnon, and Christophe Lecoutre. Sparse-Sets for Domain Implementation. In CP workshop on Techniques foR Implementing Constraint programming Systems (TRICS), pages 1–10, Uppsala, Sweden, September 2013. URL: https://hal.science/hal-01339250.
- [13] Brendan D McKay and Adolfo Piperno. Practical graph isomorphism, ii. Journal of symbolic computation, 60:94–112, 2014. doi:10.1016/J.JSC.2013.09.003.
- [14] Vyacheslav Moklev and Vladimir Ulyantsev. Bfs enumeration for breaking symmetries in graphs, 2018. arXiv:1804.02273.
- [15] Siegfried Nijssen and Joost N Kok. The gaston tool for frequent subgraph mining. Electronic Notes in Theoretical Computer Science, 127(1):77–87, 2005. doi:10.1016/J.ENTCS.2004.12.039.
- [16] Xiao Peng and Christine Solnon. Using canonical codes to efficiently solve the benzenoid generation problem with constraint programming. In 29th International Conference on Principles and Practice of Constraint Programming (CP 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2023.
- [17] Charles Prud’homme and Jean-Guillaume Fages. Choco-solver: A java library for constraint programming. Journal of Open Source Software, 7(78):4708, 2022. doi:10.21105/joss.04708.
- [18] Vladimir Ulyantsev, Ilya Zakirzyanov, and Anatoly Shalyto. Bfs-based symmetry breaking predicates for DFA identification. In Language and Automata Theory and Applications - 9th International Conference, LATA 2015, volume 8977 of Lecture Notes in Computer Science, pages 611–622. Springer, 2015. doi:10.1007/978-3-319-15579-1_48.
- [19] David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988. doi:10.1021/CI00057A005.
- [20] Xifeng Yan and Jiawei Han. Closegraph: mining closed frequent graph patterns. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286–295, 2003. doi:10.1145/956750.956784.
