Automatic Proofs for Formulae Enumerating Proper Polycubes

We develop a general framework for computing formulae enumerating polycubes of size n which are proper in n−k dimensions (spanning all n−k dimensions), for a fixed value of k. Besides the fundamental importance of knowing the number of these simple combinatorial objects, such formulae are central in the literature of statistical physics in the study of percolation processes and the collapse of branched polymers. We re-affirm the already-proven formulae for k ≤ 3, and prove rigorously, for the first time, that the number of polycubes of size n that are proper in n−4 dimensions is 2n−7nn−9(n− 4)(8n8− 128n7+828n6− 2930n5+7404n4− 17523n3+ 41527n2 − 114302n+ 204960)/6. 1 Email: barequet@cs.technion.ac.il 2 Email: mshalah@cs.technion.ac.il Available online at www.sciencedirect.com www.elsevier.com/locate/endm


Introduction
A d-dimensional polycube of size n is a connected set of n cubes in d dimensions, where connectivity is through (d−1)-dimensional faces. Two fixed polycubes are considered distinct if they differ in their shapes or orientations. A polycube is called proper in d dimensions if the convex-hull of the centers of its cubes is d-dimensional. Following Lunnon [4], we let DX(n, d) denote the number of fixed polycubes of size n that are proper in d dimensions.
Enumeration of polycubes and computing their asymptotic growth rate are important problems in combinatorics and discrete geometry, originating in statistical physics [3], where they play a fundamental role in the analysis of percolation processes and the collapse of branched polymers. To-date, no formula is known for A d (n), the number of fixed polycubes of size n in d dimensions, for any fixed value of d. The main interest in DX stems from the fact that A d (n) can be easily computed using the formula A d (n) = d i=0 d i DX(n, i) (Lunnon [4]). In a matrix listing the values of DX, the top-right triangular half and the main diagonal contain only 0s. This gives rise to the question of whether a pattern can be found in the sequences DX(n, n − k), where k > 0 is the ordinal number of the diagonal. Significant progress in estimating λ d , the asymptotic growth rate of the number of polycubes in d dimensions, has been obtained in the literature of statistical physics, although the computations usually relied on unproven assumptions and on formulae for DX(n, n − k) interpolated empirically from known values of A d (n). Peard and Gaunt [7] predicted that for k > 1, the diagonal formula DX(n, n − k) has the pattern 2 n−2k+1 n n−2k−1 (n−k)h k (n), where h k (n) is a polynomial in n, and conjectured explicit formulae for h k (n) for k ≤ 6. Luther and Mertens [5] conjectured a formula for k = 7.
It is easy to show that DX(n, n−1) = 2 n−1 n n−3 (seq. A127670 in [6]). Barequet et al. [2] proved for the first time that DX(n, n−2) = 2 n−3 n n−5 (n − 2)(2n 2 − 6n + 9) (seq. A171860). The proof uses a case analysis of the possible structures of spanning trees of the polycubes, and the various ways in which cycles can be formed in their cell-adjacency graphs. Similarly, Asinowski et al. [1] proved that DX(n, n−3) = 2 n−6 n n−7 (n − 3)(12n 5 − 104n 4 + 360n 3 − 679n 2 + 1122n − 1560)/3, again, by counting spanning trees of polycubes, yet the reasoning and calculations were significantly more involved. The inclusionexclusion principle was applied in order to count correctly polycubes whose cell-adjacency graphs contained certain subgraphs, so-called "distinguished structures." In comparison with the case k=2, the number of such structures is substantially higher, and the ways in which they can appear in spanning trees  are much more varied. The latter proof provided a better understanding of the difficulties that one would face in applying this technique to higher values of k.
The number of distinguished structures grows rapidly, the inclusion relations between them are much more complicated, and the ways in which they are connected by forests are much more varied. As anticipated [1], it is totally impractical to manually achieve a similar proof for k>3.
In this paper we create a theoretical set-up for proving the formulae for DX(n, n − k) for a fixed value of k. Our method fully automates the manual method presented in [1,2]. For this nontrivial generalization, we need a few key observations about polycubes that are proper in n−k dimensions. We also provide a general characterization of distinguished structures, and design algorithms that produce and analyze them automatically. Using our implementation of this method, we find the explicit formula (which has never been proven before) for DX(n, n − 4), stated in the following theorem. Theorem 1.1 DX(n, n−4) = 2 n−7 n n−9 (n−4)(8n 8 −128n 7 +828n 6 −2930n 5 + 7404n 4 − 17523n 3 + 41527n 2 − 114302n + 204960)/6.

Overview of the Method
Denote by P n the set of proper polycubes of size n in n−k dimensions. Let P ∈ P n , and let γ(P ) denote the adjacency graph of P constructed as follows: The vertices correspond to the cells of P ; two vertices are connected by an edge if their corresponding cells are adjacent; and an edge has label i (1 ≤ i ≤ n−k) if the corresponding cells have different i-coordinate. The direction of the edge is from the lower to the higher cell. See Figure 1 for an example. Since P → γ(P ) is an injection, it suffices to count the graphs obtained from the members of P n in this way. We count these graphs by counting their spanning trees. A spanning tree of γ(P ) has n−1 edges labeled by numbers from the set {1, 2, . . . , n − k}; All these labels are present because the polycube is proper in n−k dimensions. Hence, n−k edges of the tree are labeled with the labels 1, 2, ..., n − k, and the remaining k−1 edges repeat labels from the same set. There is a bijection between the possibilities of repeated edge labels and A dotted line is drawn between every pair of neighboring cells and around every pair of coinciding cells.
the partitions of the integer k−1. Specifically, each partition p={a 1 , . . . , a h } (where h i=1 a i = k−1) corresponds to h repeated labels in the spanning tree, such that the ith repeated label appears a i +1 times. In such a case we say that the tree is "labeled according to p." We denote by Π(m) the set of all partitions of the integer m. When we consider a spanning tree of γ(P ), we distinguish a repeated label i which appears r times by i, i , ..., i (r−1) .
To compute |P n |, we consider all possible directed edge-labeled trees of size n with the possible repetitions of edge labels, and count only those that represent valid polycubes, then derive the actual number of polycubes.

Distinguished Structures
For each directed edge-labeled tree, one can attempt to build the corresponding polycube. In this process there are two types of problems: (a) Cells may coincide (a tree with overlapping cells is invalid, see Figures 2(a,d)); and (b) Two cells which are not connected by a tree edge may be adjacent (such a tree corresponds to a polycube P with cycles in γ(P ), and, therefore, its spanning tree is not unique, see Figures 2(b,e)). A distinguished structure is the union of all paths (edges and incident vertices) that connect two coinciding or adjacent cells. This characterization allows the design of an algorithm for producing DS k : The set of all distinguished structures in n−k dimensions. We begin with generating all "free trees" (non-isomorphic trees) of size at most the value specified in Lemma 3.1. Then, we process each free tree T of size t by labeling its edges according to every partition p ∈ ∪ k−1 i=1 Π(i) so as to obtain a directed edge-labeled tree T , and then checking (by a DFS traversal) whether T contains coinciding or neighboring cells. T is added to DS k if it is not isomorphic to any structure of size t already in DS k , and if at least one of the following conditions holds: (i) T contains two coinciding or neighboring cells which are connected by a path with t−1 edges (see, e.g., Figures 2(a,b,d,e)); (ii) T is isomorphic to the union of d 1 , ..., d m ∈ DS k , such that the isomorphic copies of d 1 , ..., d m in T cover all its edges (see, e.g., Figures 2(c,g)). Disconnected distinguished structures (see, e.g., Figure 2(f)) are generated by checking if collections of edge-connected structures in DS k yield a single disconnected structure labeled according to p ∈ ∪ k−1 i=1 Π(i). Lemma 3.1 A connected (resp., disconnected) distinguished structure in DS k has at most 3k − 2 (resp., 4k) vertices. Let T p denote the number of directed trees with n vertices labeled according to p ∈ Π(k − 1). Then, T p = π(p) n−k |p| 2 n−1 n n−3 . Lemma 3.3 Let σ ∈ DS k be composed of k * ≥1 trees s 1 , . . . , s k * with a total of n * vertices and distinct edge labels. The number of occurrences of σ in trees of size n with distinct edge labels is Proof. (Sketch) We proceed by double counting, enumerating in two ways the different sequences of directed edges that can be added to a graph with n−n * vertices and a structure σ so as to form a rooted tree. One way is to add the edges one by one: There are N = k * i=1 |s i | options to choose a root for each component. We begin with a forest with n−n * +k * rooted trees. After adding a set of edges forming a rooted forest with i trees, there are n(i−1) choices for the next edge. Therefore, the total number of choices is N n−n * +k * i=2 n(i−1). Another way to count these sequences is to start with an unrooted edge-labeled tree containing σ, choose one of its n vertices as a root, and choose one of the (n−n * )! possible sequences, say, η, label the n−n * vertices which do not belong to σ according to η, and "shift" each vertex label to the incident edge towards the root. The number of sequences formed this way is nF n (σ)(n−n * )!. Hence, F n (σ) = N (n−n * +k * −1)! (n−n * )! n n−n * +k * −2 , as claimed.2

Inclusion-Exclusion
Let F n (σ) denote the number of occurrences of σ in directed edge-labeled trees of size n. Obviously, F n (σ)=2 n−n * +k * −1 F n (σ). Let the distinguished structure σ ∈ DS k be labeled according to p ∈ ∪ k−1 i=1 Π(i). Let us denote by O p (σ ) the number of occurrences of σ in directed trees of size n that are labeled according to p ∈ Π(k−1). Computing O p (σ ) involves choosing the |p| repeated labels in the tree and the labels of σ , counting the automorphisms of σ , and multiplying by F n (σ ).
When counting the occurrences of a structure σ ∈ DS k , other distinguished structures which contain multiple occurrences of σ are counted multiple times. In order to count correctly, we build an inclusion-exclusion graph IE=(V, E) whose vertices correspon to the structures in DS k . There is an edge e = σ 1 →σ 2 labeled with c if σ 1 contains c occurrences of σ 2 . Hence, the roots of IE are all the structures that are not contained in any other structure. Let (e) denote the label of the edge e, I(σ 2 ) denote the set {σ 1 ∈ V : (σ 1 →σ 2 ) ∈ E}, and T p (σ) denote the number of trees of size n labeled according to p ∈ Π(k−1) that contain σ but no other structure σ ∈ I(σ) as a subtree. It is easy to prove by induction that T p (σ 2 ) = O p (σ 2 )− σ 1 ∈I(σ 2 ) (σ 1 →σ 2 )T p (σ 1 ). Figure 3 shows a subgraph of IE for k=4. A simple bottom-up procedure traverses the graph and computes T p (u) for every vertex u ∈ V and every partition p ∈ Π(k − 1).

Trees
Similarly to DX(n, d), let DT(n, d) denote the number of fixed tree polycubes of size n that are proper in d dimensions. Every tree polycube gives rise to a unique spanning tree, and every directed tree labeled according to p ∈ Π(k−1) corresponds to a tree polycube in P n unless it contains a structure σ ∈ DS k as a subtree. (A spanning tree of a tree polycube can neither contain coinciding cells because these are illegal, nor can it contain neighboring cells). Thus, we exclude all the trees that contain any structure σ ∈ DS k as a subtree, obtaining that DT(n, n−k) = p∈Π(k−1)

. (The division by
is because each tree polycube is counted that many times.)

Nontrees
Let C(k) denote the set of all cycle structures of polycubes proper in n−k dimensions. This set can be found using DS k : A distinguished structure is a spanning tree of a cycle if it contains only neighboring cells and no coinciding cells. For example, the structure shown in Figure 2(e) is a spanning tree of the cycle shown in Figure 2(h). For any C i ∈ C(k), let P C i denote the number of polycubes P ∈ P n that contain C i in γ(P ). Suppose that a distinguished structure σ ∈ DS k has c occurrences in C i . Then, P C i = p∈Π(k−1) Finally, DX(n, n−k) = DT(n, n−k) + |C(k)| i=1 P C i .

Results
The entire method was implemented in a C++ program, using Mathematica for simplifying the final formula. Our results agree completely with the formulae conjectured in the literature of statistical physics. For k=3, the program found 147 distinguished structures and 13 cycle structures. For k=4, the program found 8,397 distinguished structures and 179 cycles. The parallel computation took about 15 minutes on a supercomputer with 12 processors and 64 GB of RAM. The program produced data files which document the entire computation, serving as a proof of Theorem 1.1.