Ferromagnetic Potts Model: Refined #BIS-hardness and Related Results

Recent results establish for 2-spin antiferromagnetic systems that the computational complexity of approximating the partition function on graphs of maximum degree D undergoes a phase transition that coincides with the uniqueness phase transition on the infinite D-regular tree. For the ferromagnetic Potts model we investigate whether analogous hardness results hold. Goldberg and Jerrum showed that approximating the partition function of the ferromagnetic Potts model is at least as hard as approximating the number of independent sets in bipartite graphs (#BIS-hardness). We improve this hardness result by establishing it for bipartite graphs of maximum degree D. We first present a detailed picture for the phase diagram for the infinite D-regular tree, giving a refined picture of its first-order phase transition and establishing the critical temperature for the coexistence of the disordered and ordered phases. We then prove for all temperatures below this critical temperature that it is #BIS-hard to approximate the partition function on bipartite graphs of maximum degree D. As a corollary, it is #BIS-hard to approximate the number of k-colorings on bipartite graphs of maximum degree D when k<= D/(2 ln D). The #BIS-hardness result for the ferromagnetic Potts model uses random bipartite regular graphs as a gadget in the reduction. The analysis of these random graphs relies on recent connections between the maxima of the expectation of their partition function, attractive fixpoints of the associated tree recursions, and induced matrix norms. We extend these connections to random regular graphs for all ferromagnetic models and establish the Bethe prediction for every ferromagnetic spin system on random regular graphs. We also prove for the ferromagnetic Potts model that the Swendsen-Wang algorithm is torpidly mixing on random D-regular graphs at the critical temperature for large q.


Spin Systems
We study the ferromagnetic Potts model and present tools which are useful for any ferromagnetic spin system on random regular graphs. Hence we begin with a general definition of a spin system.
A spin system is defined, for an n-vertex graph G = (V, E) and integer q ≥ 2, on the space Ω of configurations σ which are assignments σ : V → [q]. The model is characterized by its energy or Hamiltonian H(σ) which is a function of the spin assignments to the vertices. In the classical examples of the Ising (q = 2) and Potts (q ≥ 3) models without external field, the Hamiltonian H(σ) is the number of monochromatic edges in σ. Each configuration has a weight w(σ) = exp(−βH(σ)) for a parameter β corresponding to the "inverse temperature" which controls the strength of edge interactions.
In our general setup, a specification of a q-state spin model is defined by a symmetric q × q interaction matrix B = {B ij } i,j∈ [q] with non-negative entries. For a graph G = (V, E), the weight of a configuration σ : V → [q] is given by: We will occasionally drop the subscript G when the graph under consideration is clear from context. The Gibbs distribution µ = µ G is defined as µ(σ) = w(σ)/Z where Z = Z G (B) = σ w(σ) is the partition function. We remark here that many of our results also apply to models with arbitrary external fields since we will work with ∆-regular graphs and in this case the external field can be incorporated into the interaction matrix.
The Ising (q = 2) and Potts (q > 2) models have interaction matrices with diagonal entries B := exp(−β) and off-diagonal entries 1. The models are called ferromagnetic if B > 1 since then neighboring spins prefer to align and antiferromagnetic if B < 1. The hard-core model is an example of a 2-spin antiferromagnetic system, its interaction matrix is defined so that Ω is the set of independent sets of G and, for activity (external field) λ > 0, a configuration σ ∈ Ω has weight w(σ) = λ |σ| (with |σ| denoting the cardinality of the independent set σ).

Ferromagnetic Models
In this paper, we will focus on ferromagnetic models, and pay special attention to the ferromagnetic Potts model. We are not aware of a general definition of ferromagnetic and antiferromagnetic models. We use the following notions which generalize the analogous notions for 2-spin and for the Potts model. The ferromagnetic definition captures that neighboring spins prefer to align (see Observation 1 below).
To avoid degenerate cases 1 , we assume throughout this paper that the interaction matrix B is ergodic, that is, irreducible and aperiodic. Hence, by the Perron-Frobenius theorem (since B has non-negative entries) the eigenvalue of B with the largest magnitude is positive. 1 If B is reducible, by a suitable permutation of the labels of the spins, B can be put in a block diagonal form where each of the blocks is either irreducible or zero. Such a model can be studied by considering the induced sub-models of each block corresponding to irreducible symmetric matrices, since the partition function for the original model is simply the sum of the partition functions of each of these sub-models. If B is periodic, and since B is symmetric, its period must be two. Such a model is only interesting on bipartite graphs (otherwise the partition function is zero), and the focus of our general results are for random ∆-regular graphs which are non-bipartite with high probability. Definition 1. A model is called ferromagnetic if B is positive definite. Equivalently we have that all of its eigenvalues are positive and also that B =B ⊺B , for some q × q matrixB.
In contrast to the above notion of a ferromagnetic system, in [17] a model is called antiferromagnetic if all of the eigenvalues of B are negative except for the largest (which, as noted above, is positive). Note, when the number of spins is greater than 2, there are models which are neither ferromagnetic nor antiferromagnetic.
The most alluring aspect of this definition is that for ferromagnetic models, neighboring vertices prefer to have the same spin. To see this, the following more general inequality is proved in [17], which is a simple application of the Cauchy-Schwarz inequality. Observation 1. Let z 1 , z 2 ∈ R q ≥0 with z 1 1 = z 2 1 = 1. For ferromagnetic B, we have Equality holds iff z 1 = z 2 . For antiferromagnetic B, the inequality is reversed.
Observe that if we plug in the above inequality the vectors with a single 1 in the positions i and j respectively, we obtain that any two spins i, j induce a ferromagnetic two-spin system.
As observed in [17], an appealing aspect of defining ferromagnetism in terms of the signature of the interaction matrix is that the definition remains invariant in the presence of external fields. More precisely, for ∆-regular graphs, any external field can be incorporated into the interaction matrix by a congruence transformation of the matrix B. The modified interaction matrix has the same number of positive, zero and negative eigenvalues as the original (this follows from the Sylvester's law of inertia), and hence it remains ferromagnetic/antiferromagnetic.

Known Connections to Phase Transitions
Exact computation of the partition function is #P-complete, even for very restricted classes of graphs [24]. Hence we focus on whether there is a fully-polynomial (randomized or deterministic) approximation scheme, a so-called FPRAS or FPTAS.
One of our goals in this paper is to refine our understanding of connections between approximating the partition function on graphs of maximum degree ∆ with phase transitions on the infinite ∆-regular tree T ∆ . A phase transition of particular interest in the infinite tree T ∆ is the uniqueness/non-uniqueness threshold. Roughly speaking, in the uniqueness phase, if one fixes a so-called "boundary condition" which is a configuration σ ℓ (for instance, an independent set in the hard-core model) on the vertices distance ℓ from the root, then in the Gibbs distribution conditioned on this configuration, is the root "unbiased"? Specifically, for all sequences (σ ℓ ) of boundary conditions, in the limit ℓ → ∞, does the root have the same marginal distribution? If so, there is a unique Gibbs measure on the infinite tree and hence we say the model is in the uniqueness region. If there are sequences of boundary conditions which influence the root in the limit then we say the model is in the non-uniqueness region.
For 2-spin antiferromagnetic spin systems, it was shown that there is an FPTAS for estimating the partition function for graphs of maximum degree ∆ when the infinite tree T ∆ is in the uniqueness region [32]. On the other side, unless NP=RP, there is no FPRAS for the partition function for ∆regular graphs when T ∆ is in the non-uniqueness region [44] (see also [16]). Recently, an analogous NP-hardness result was shown for approximating the number of k-colorings on triangle-free ∆regular graphs for even k when k < ∆ [17]. In contrast to the above inapproximability results for antiferromagnetic systems, for the ferromagnetic Ising model with or without external field [29] and for 2-spin ferromagnetic spin systems without external field [23] there is an FPRAS for all graphs. The situation for ferromagnetic multi-spin models, the ferromagnetic Potts being the most prominent example, is more intricate.
#BIS refers to the problem of computing the number of independent sets in bipartite graphs. A series of results has presented evidence that there is unlikely to be a polynomial-time algorithm for #BIS, since a number of unsolved counting problems have been shown to be #BIS-easy (for example, see [14,3,8]). The growing anecdotal evidence for #BIS-hardness suggests that the problem is intractable, though weaker than NP-hardness. More recently, it was shown in [7] that for antiferromagnetic 2-spin models it is #BIS-hard to approximate the partition function on bipartite graphs of maximum degree ∆ when the parameters of the model lie in the non-uniqueness region of the infinite ∆-regular tree T ∆ . Also, for ferromagnetic 2-spin models with external field, [33] shows #BIS-hardness for some region of the parameter space (note, the known regions of the parameter space where an FPRAS exists, see [23,33,20], do not yet completely complement the #BIS-hardness result).

Outline of Results
Our focus in this paper is on understanding the behavior of ferromagnetic spin systems. Our main tools are bipartite random regular graphs and random regular graphs. Whether we use bipartite or general graphs depends on the context, and we use whichever yields the strongest results in that context. For instance, we establish #BIS-hardness for the ferromagenetic Potts model in Section 2.1; to obtain hardness results on the class of bipartite graphs we use bipartite random regular graphs as the core of the gadget. In Section 2.3 we establish results for the Swendsen-Wang algorithm; such results are more interesting for general graphs and hence we prove this result for random regular graphs.
In [17] we established concentration of the partition function for general spin systems on bipartite random regular graphs. At first glance the picture for random regular graphs is more complicated than for their bipartite counterparts since the connection to trees is less clear for general models, however for ferromagnetic models an analogous connection holds as we will establish in Section 3.2. For ferromagnetic systems, we establish concentration on random regular graphs as detailed in Section 3.1. As a consequence we establish the so-called Bethe prediction for random regular graphs as discussed in Section 3.1.

#BIS-hardness for the Potts model
Goldberg and Jerrum [21] showed that approximating the partition function of the ferromagnetic Potts model is #BIS-hard, hence it appears likely that the ferromagnetic Potts model is inapproximable for general graphs. We refine this #BIS-hardness result for the ferromagnetic Potts model. We prove that approximating the partition function for the ferromagnetic Potts model on bipartite graphs of maximum degree ∆ is #BIS-hard for temperatures below the appropriate phase transition point in the infinite tree T ∆ . The appropriate phase transition in the Potts model is not the uniqueness/non-uniqueness threshold, but rather it is the ordered/disordered phase transition which occurs at B = B o as explained in the next section.
Formally, we study the following problem. Name. #BipFerroPotts(q,B,∆). Instance. A bipartite graph G with maximum degree ∆. Output. The partition function for the q-state Potts model on G.
We use the notion of approximation-preserving reductions, denoted as ≤ AP , formally defined in [14] (roughly, for counting problems #Π 1 and #Π 2 , #Π 1 ≤ AP #Π 2 implies that the existence of an FPRAS for #Π 2 implies the existence of an FPRAS for #Π 1 ). We can now formally state our main result.
Theorem 2. For all q ≥ 3, all ∆ ≥ 3, for the ferromagnetic q-state Potts model, for any B > B o , where B o is given by (3). Theorem 2 has a simple, yet interesting, consequence for the problem of approximately counting k-colorings on bipartite graphs of maximum degree ∆. Recall that for general graphs of maximum degree ∆, approximately counting k-colorings is NP-hard whenever k < ∆ (and k is even) due to the result of [17]. Theorem 2 yields #BIS-hardness for bipartite k-colorings whenever k ≤ ∆/(2 log ∆). Formally, we are interested in the following problem. Name. #BipColorings(k,∆). Instance. A bipartite graph G with maximum degree ∆. Output. The number of proper k-colorings of G.
We use a relatively simple reduction from the ferromagnetic Potts model to bipartite colorings, first observed in [14], which works even for bounded-degree graphs. Theorem 2 then yields the following corollary (which is proved in Section 9.2).

Potts Model Phase Diagram
To understand the critical point B o we need to delve into the nature of the phase transition in the ferromagnetic Potts model on the infinite ∆-regular tree T ∆ . We focus on how the phase transition manifests on a random ∆-regular graph.
We refer to α ∈ △ q as a phase. For a phase α, denote the set of configurations with frequencies of colors given by α as 2 : Σ α = σ : V → {1, . . . , q} |σ −1 (i)| = α i n for i = 1, . . . , q , and denote the partition function restricted to these configurations by: Let G denote the uniform distribution over ∆-regular graphs with n vertices (for ∆n even). Denote the exponent of the first moment as: We derive the expression for Ψ 1 in Section 4. Those α which are global maxima of Ψ 1 we refer to as dominant phases. We will see in Section 3.2 that, for all ferromagnetic models, roughly speaking, the candidates for dominant phases correspond to stable fixpoints of the so-called tree recursions.
For the ferromagnetic Potts model, there will be two types of phases with particular interest; we refer to these two types as the disordered phase and the ordered phases. The disordered phase is the uniform vector α = (1/q, . . . , 1/q). The ordered phase refers to a phase with one color dominating in the following sense: one coordinate is equal to a > 1/q and the other q − 1 coordinates are equal to (1 − a)/(q − 1). Due to the symmetry of the Potts model, when the ordered phase dominates, in fact, the q symmetric ordered phases dominate. These ordered phases have a specific a = a(q, B, ∆) which corresponds to a fixpoint of the tree recursions. The exact definition of this marginal a is not important at this stage, and hence we defer its definition to a more detailed discussion which takes place in Section 8 (see equation (38)).
One of the difficulties for the Potts model is that the nature of the uniqueness/non-uniqueness phase transition on T ∆ is inherently different from that of the Ising model. The ferromagnetic Ising model undergoes a second-order phase transition on T ∆ which manifests itself on random ∆-regular graphs in the following manner. In the uniqueness region the disordered phase dominates, and in the non-uniqueness region the 2 ordered phases dominate.
In contrast, the ferromagnetic Potts model undergoes a first-order phase transition at the critical activity B u . For B < B u there is a unique Gibbs measure on T ∆ . For B ≥ B u there are multiple Gibbs measures on T ∆ , however there is a second critical activity B o corresponding to the disordered/ordered phase transition: for B ≤ B o the disordered phase dominates, and for B ≥ B o the ordered phases dominate (and at the critical point B o all of these q + 1 phases dominate).
We present a detailed picture of the phase diagram for the ferromagnetic Potts model. Previously, Häggström [26] established the uniqueness threshold B u by studying percolation in the random cluster representation. In addition, Dembo et al. [12,13] studied the ferromagnetic Potts model (including the case with an external field) and proved that for B > B u , either the disordered or the q ordered phases are dominant, but they did not establish the precise regions where each phase dominates. For the simpler case of the complete graph (known as the Curie-Weiss model), [10] detailed the phase diagram.
Häggström [26] established that the uniqueness/non-uniqueness threshold for the infinite tree T ∆ occurs at B u which is the unique value of B for which the following polynomial has a double root in (0, 1): The disordered phase is dominant in the uniqueness region and continues to dominate until the following activity (which was considered by Peruggi et al. [41]): Finally, Häggström [26] considers the following activity B rc , which he conjectures is a (second) threshold for uniqueness of the random-cluster model, defined as: We prove the following picture for the phase diagram for the ferromagnetic Potts model in Section 8. Note, to prove that a function has a local maximum at a point, a standard approach is to show that its Hessian matrix is negative definite. We often need this stronger condition in our proofs, hence we use the following definition. Those dominant phases α with negative definite Hessian are called Hessian dominant phases. Note that dominant phases always exist but a dominant phase can fail to be Hessian (when some eigenvalue of the underlying Hessian is equal to zero). In Section 3.2, we give an alternative formulation of the Hessian condition in terms of the local stability of fixpoints of the tree recursions. B u < B < B rc : The local maxima of Ψ 1 are the disordered phase u and the q ordered phases (the ordered phases are permutations of each other). All of these q + 1 phases are Hessian local maxima. Moreover:

Swendsen-Wang Algorithm
An algorithm of particular interest for the ferromagnetic Potts model is the Swendsen-Wang algorithm. The Swendsen-Wang algorithm is an ergodic Markov chain whose stationarity distribution is the Gibbs distribution. It utilizes the random-cluster representation to overcome potential "bottlenecks" for rapid mixing that are expected to arise in the non-uniqueness region. As a consequence of the above picture for the phase diagram on the infinite tree T ∆ and our tools for analyzing random regular graphs, we can prove torpid mixing of the Swendsen-Wang algorithm at the disordered/ordered phase transition point B o . (Torpid mixing means that the mixing time is exponentially slow.) The Swendsen-Wang algorithm utilizes the random cluster representation (see [25]) of the Potts model to potentially overcome bottlenecks that obstruct the simpler Glauber dynamics. It is formally defined as follows. From a configuration X t ∈ Ω: 1. Let M be the set of monochromatic edges in X t .
2. For each edge e ∈ M , delete it with probability 1/B. Let M ′ denote the set of monochromatic edges that were not deleted.
3. In the graph (V, M ′ ), for each connected component, choose a color uniformly at random from [q] and assign all vertices in that component the chosen color. Let X t+1 denote the resulting spin configuration.
There are few results establishing rapid mixing of the Swendsen-Wang algorithm beyond what is known for the Glauber dynamics, see [47] for recent progress showing rapid mixing on the 2dimensional lattice. However, there are several results establishing torpid mixing of the Swendsen-Wang algorithm at a critical value for the q-state ferromagnetic Potts model: on the complete graph (q ≥ 3) [22], on Erdös-Rényi random graphs (q ≥ 3) [9], and on the d-dimensional integer lattice Z d (q sufficiently large) [4,5].
Using our detailed picture of the phase diagram of the ferromagnetic Potts model and our generic second moment analysis for ferromagnetic models on random regular graphs which we explain in a moment, we establish torpid mixing on random ∆-regular graphs at the phase coexistence point B o .
Theorem 5. For all ∆ ≥ 3 and q ≥ 2∆/ log ∆, with probability 1−o(1) over the choice of a random ∆-regular graph, for the ferromagnetic Potts model with B = B o , the Swendsen-Wang algorithm has mixing time exp(Ω(n)).
We believe that the lower bound on q in Theorem 5 is an artifact of our proof, see Remark 8 in Section 10 for details.

Second Moment and Bethe prediction
We analyze the Gibbs distribution on random ∆-regular graphs using second moment arguments. The challenging aspect of the second moment is determining the phase that dominates, as we will describe more precisely momentarily. In a straightforward analysis of the second moment, this reduces to an optimization problem over q 4 variables for a complicated expression. Even for q = 2 tackling this requires significant effort (see, for example, [40] for the hard-core model).
In a recent paper [17] we analyzed antiferromagnetic systems on bipartite random ∆-regular graphs, to use as gadgets for inapproximability results. In that work we presented a new approach for simplifying the analysis of the second moment for antiferromagnetic models using the theory of matrix norms. In this paper we extend that approach using the theory of matrix norms to analyze the second moment for random ∆-regular graphs (non-bipartite) for ferromagnetic systems. We obtain a short, elegant proof that the exponential order of the second moment is twice the exponential order of the first moment.
Denote the leading term of the second moment as Our main technical result is the analysis of the second moment for ferromagnetic models. We will relate the maximum of the second moment to the maximum of the first moment. To analyze the second moment we need to determine the phase α that maximizes Ψ 2 . We will first show how to reexpress the critical points of Ψ 1 in a form that can be readily expressed in terms of matrix norms (see Section 5.1). Then, using the Cholesky decomposition of the interaction matrix B and properties of matrix norms we will show that the second moment is maximized at a phase which is a tensor product of the dominant phases of the first moment. This results in the following theorem, which is proved in Section 5.2.
Combining Theorem 6 with an elaborate variance analysis known as the small subgraph conditioning method allows us to obtain a lower bound on Z α G which matches its expectation up to a polynomial factor (see Lemma 11). In particular, we verify the so-called Bethe prediction (see [13,12]) for general ferromagnetic models on random ∆-regular graphs, which is captured in our setting by equation (5) in the following theorem (the proof is in Section 7.2). Our interest in the quantity E G [log Z G ] stems from the fact that it gives information about the typical configurations on a random regular graph and hence Theorem 7 gives its value in terms of a much simpler quantity, log E G [Z G ], which can be calculated much more easily, see Section 4. (The equality in (5) is closely related to the cavity method, see [36].) Theorem 7. Let B specify a ferromagnetic model. Then, if there exists a Hessian dominant phase, it holds that lim Note that for a ferromagnetic model the interaction matrix B is positive definite and hence the entries on the diagonal are all positive. Thus Z G is always positive for every graph G (and hence log Z G in (5) is well-defined).
Theorem 7 holds for all ferromagnetic models at any temperature. Dembo et al. [13] consider general factor models on graph sequences converging locally to trees and verify the Bethe prediction when the underlying tree is in the uniqueness regime. In [12], the case of the ferromagnetic Potts model (with external field) is considered for graph sequences converging locally to trees and they obtain a general formula for the logarithm of the partition function.
Perhaps the most important conceptual content of Theorem 7 is that it shows that all ferromagnetic models, at any temperature, do not exhibit the complex behavior that other spin models, such as colorings or the antiferromagnetic Potts model, exhibit on random (regular) graphs. In particular, when the equality in (5) fails, we have the so-called condensation regime, and in that case calculating 1 n E G [log Z G ] is a far more intricate task (see the recent works [1,45]). Theorem 7 can be extended to general models (not necessarily ferromagnetic) on random ∆regular graphs under the stronger assumption that there is a unique semi-translation invariant Gibbs measure on T ∆ . In this setting, one also obtains the analogue of Theorem 6 and as a consequence concentration for Z α G for the (unique) dominant phase α, which can be used to verify (in complete analogy) the Bethe prediction, see Section 11.3 for more details.

Connection to Tree Recursions
As a consequence of Theorem 6, to analyze ferromagnetic models on random regular graphs, one only needs to analyze the first moment. To simplify the analysis of the first moment, we establish the following connection to the so-called tree recursions. An analogous connection was established in [17] for antiferromagnetic models on random bipartite ∆-regular graphs.
A key concept are the following recursions corresponding to the partition function on trees, and hence we refer to them as the (depth one) tree recursions: The fixpoints of the tree recursions are those R = (R 1 , . . . , R q ) such that: We refer to a fixpoint R of the tree recursions as Jacobian attractive if the Jacobian at R has spectral radius less than 1. We prove the following theorem detailing the connections between the tree recursions and the critical points of the partition function for random regular graphs.
Theorem 8. Assume that the model is ferromagnetic. Jacobian attractive fixpoints of the (depth one) tree recursions are in one-to-one correspondence with the Hessian local maxima of Ψ 1 .
The above connection fails for antiferromagnetic models, e.g., for the antiferromagnetic Potts model the uniform distribution is a global maximum but it is not a stable fixpoint of the tree recursions for small enough temperature. (In fact, for antiferromagnetic models every solution of the tree recursions is a local maximum, see Remark 3.) Using the above connection we establish the detailed picture for the dominant phases of the ferromagnetic Potts model as stated in Theorem 4.

Expressions for Ψ 1 and Ψ 2
In this section, we derive expressions for the first and second moments of Z α G , which will allow us to derive explicit expressions for the functions Ψ 1 (α) and Ψ 2 (α). Similar expressions have appeared in [12, Section 2.1] in a slightly different form. Our exposition here is such that it provides a straightforward alignment with the analogous expressions in [17]. The minor differences are due to the model of ∆-regular random graphs, which in this paper is the pairing model G(n, ∆). We first specify the model of ∆-regular random graphs.
The distribution G(n, ∆) on ∆-regular multigraphs is generated by the following random process. For ∆n even, consider the set [∆n]. Elements of [∆n] will be called points. First, a random perfect matching of the ∆n points is sampled. Then, for i = 1, . . . , n, we identify the points ∆(i − 1) + 1, . . . , ∆i as a single vertex of a graph G. The edges of G are naturally induced by the edges of the random matching and hence every vertex has degree ∆. Note that G may contain parallel edges or self-loops. It is well known that any property which holds asymptotically almost surely for the pairing model (i.e., with probability 1 − o(1) as n → ∞) holds asymptotically almost surely for the uniform distribution on ∆-regular (simple) graphs as well, see for example [28]. This is going to be the case for our results.
Recall, △ t denotes the simplex Let G ∼ G(n, ∆) and denote by V the vertex set of G. For a configuration σ : V → {1, . . . , q}, we denote the set of vertices assigned color i by σ −1 (i). For α ∈ △ q and nα ∈ Z q , let that is, Σ α is the set of configurations σ which assign α i n vertices of V the color i, for each i ∈ [q]. We are interested in the total weight Z α G of configurations in Σ α , namely Note that Z α G is a r.v., and as indicated earlier, we will look at its moments E G [Z α G ] and E G [(Z α G ) 2 ]. We begin with the first moment. For σ ∈ Σ α and i, j ∈ [q], let e ij n denote the number of edges between vertices in σ −1 (i) and σ −1 (j). Clearly, e ij = e ji . It will be notationally convenient to reparameterize the variables e ij as follows: for i = j we set e ij = ∆x ij and for i = j we set e ii = ∆x ii /2. For future use, when G ∼ G(n, ∆), we denote by x G (σ) the random vector (x 11 , . . . , x qq ).
The number of perfect matchings between 2n vertices is (2n − 1)!! = (2n)!/(n!2 n ). Under the convention that 0 0 ≡ 1, we then have where the sum ranges over all the possible values of the random vector x G (σ). In particular, x = (x 11 , . . . , x qq ) satisfying: The first line in (8) accounts for the cardinality of Σ α , while the second line is E G [w G (σ)] for a fixed σ ∈ Σ α , since by symmetry we may focus on any fixed σ. The first product is the number of ways to choose a partition of the points which is consistent with the values prescribed by x, the fraction is the probability that the random matching connects the points as prescribed, and the last product is the weight of the configuration σ conditioned on x. We next consider the second moment of Z α G . The desired expression may be derived analogously to (8).
The vector γ captures the overlap of the configurations σ 1 , σ 2 . Denote by e ikjl n the number of edges matching vertices in σ −1 . We reparameterize as follows: for (i, k) = (j, l) we set e ikjl = ∆y ikjl and for (i, k) = (j, l) we set e ikjl = ∆y ikjl /2.
where the sums range over γ = (γ 11 , . . . , γ qq ), y = (y 1111 , . . . , y qqqq ) satisfying The sums in (8) and (10) are typically exponential in n. The most critical component of our arguments is to find the quantitative structure of configurations which determine the exponential order of the moments. Formally, we study the limits of 1 n log E G Z α G and 1 n log E G (Z α G ) 2 as n → ∞. These limits can be derived from (8) and (10) using Stirling's approximation formula. In particular, we shall use that for a constant c > 0 with cn even, we have Under the usual conventions that ln 0 ≡ −∞ and 0 ln 0 ≡ 0, the above formulas are correct even in the degenerate case c = 0. We now derive asymptotics for the first moment E G Z α G in order to obtain the function Ψ 1 (α), see equation (1). Applying (12) yields: where defined on the region (9). Completely analogously, for the second moment we obtain: where Υ 2 (γ, y) := (∆ − 1)f 2 (γ) + ∆g 2 (y), defined on the region (11).

Remark 1.
It is useful to think of the second moment as the first moment of a paired-spin model with interaction matrix B ⊗ B. Indeed, from (14), we can interpret B ij B kl as the activity between the paired spins (i, k) and (j, l), thus giving the desired alignment.

Critical Points and Matrix Norms
It will be useful to reformulate function Ψ 1 into the following version which will preserve the critical points, and readily yield a formulation in terms of matrix norms. Let where R = (R 1 , . . . , R q ) ⊺ ≥ 0, i.e., R has non-negative entries. Let p := ∆/(∆ − 1). Note that (15) has the following appealing form where R p = ( n i=1 R p i ) 1/p . This will allow us to use the techniques from the area of matrix norms in our arguments, more specifically, results on induced matrix norms. The induced matrix norms will be denoted · p→q ′ : Since we assume that B is ferromagnetic we have B =B ⊺B and hence we can write The next lemma describes the connection between Φ 1 and Ψ 1 . We note that Φ 1 is not a reparameterization of Ψ 1 , however they do agree at the critical points. This is sufficient for our purpose: to understand the maxima of Ψ 1 it is enough to understand the maxima of Φ 1 . The maximization is the induced p → 2 matrix norm ofB. The first equality in (19) follows from the fact that the maximum on the right-hand-side of (16) is achieved for non-negative R (this follows from the fact that B has non-negative entries).

Lemma 9.
There is a one-to-one correspondence between the fixpoints of the tree recursions and the critical points of Φ 1 (both considered for R i ≥ 0 in the projective space, that is, up to scaling by a constant). The following transformation R → α given by: yields a one-to-one-to-one correspondence between the critical points of Φ 1 and the critical points of Ψ 1 (in the region defined by α i ≥ 0 and i α i = 1). Moreover, for the corresponding critical points R and α one has Finally, the local maxima of Φ 1 and Ψ 1 happen at the critical points (that is, there are no local maxima on the boundary).
We omit the proof of Lemma 9 since it follows the proof of Theorem 4.1 in [17,Section 4]. In that paper we consider random ∆-regular bipartite graphs and in analogy to Ψ 1 (α) and Φ 1 (R) we define Ψ 1 (α, β) and Φ 1 (R, C), respectively, where α, R now correspond to the left-side of the bipartition and β, C to the right-side of the bipartition. The expressions in our setting (random ∆regular graphs) are identical to those in [17] (random ∆-regular bipartite graphs) after identifying α with β and R with C. In fact, the proof of Theorem 4.1 in [17] works almost verbatim in our case after this identification.

Second Moment Analysis
For ferromagnetic models, Lemma 9 allows us to reexpress the optimization problem associated with the first moment in terms of matrix norms.
A key fact is that Ψ 2 is given by a constrained first moment calculation on a "paired-spin" model where the interaction matrix in this model is the tensor product of the original interaction matrix with itself (see Remark 1 in Section 4). The second moment considers a pair of configurations, say σ and σ ′ , which are constrained to have a given phase α. We capture this constraint using a vector γ corresponding to the overlap between σ and σ ′ , in particular, γ ij is the number of vertices with spin i in σ and spin j in σ ′ .
Recall, Ψ B 1 indicates the dependence of the function Ψ 1 on the interaction matrix B; to simplify the notation we will drop the exponent if it is B. More precisely, where the optimization in (22) is constrained to γ such that Ignoring the two constraints in (23) can only increase the value of (22) and hence For induced norms · p→q ′ with p ≤ q ′ it is known (Proposition 10.1 in [2]) that Now we are ready to prove Theorem 6.
Remark 2. We will illustrate the necessity of the ferromagnetism assumption in Theorem 6 by giving an example of an antiferromagnetic model for which the second moment method fails (i.e., the second moment is larger than the square of the first moment by an exponential factor and therefore does not provide any useful concentration). Consider proper 3-colorings of random 10regular graphs. As the size of the graph goes to infinity the probability of it being 3 colorable goes to zero. The intuitive effect of this is that to achieve a large value in the "paired-spin" model it is better to correlate the coordinates to agree. In terms of Ψ 1 and Ψ 2 we have that the maximum in the first moment is achieved for α 1 = α 2 = α 3 = 1/3 with Ψ 1 = 5 ln 2 − 4 ln 3 < 0. To obtain a lower bound on the maximum in the second moment we take γ 11 = γ 22 = γ 33 = 1/3, which yields Ψ 2 = Ψ 1 > 2Ψ 1 . The argument actually applies whenever Ψ 1 < 0 (for models whose interaction matrices have 0's and 1's). By continuity (taking small B in the antiferromagnetic Potts model) one can obtain an example of a model without hard constraints for which the second moment fails.

Connections
In this section we prove Theorem 8 which describes the connection between the stable fixpoints of the tree recursions and the local maxima of Ψ 1 . Theorem 8 will then be a key tool in our proof of Theorem 4. The technical core of the technique relies on the arguments in [17], where an analogous connection has been established for random bipartite regular graphs. The arguments here are a minor modification of this approach, suitably modified to account for random regular graphs. Our starting point is the one-to-one correspondence between fixpoints of the tree recursions and the critical points of Ψ 1 (see, [40], and also [17]). We show, roughly, that the stability of a fixpoint is equivalent to the local maximality of the corresponding critical point. This will be done by relating the Jacobian of the tree recursions at a fixpoint with the Hessian of Ψ 1 at the corresponding critical point. More precisely, we show that the Jacobian has spectral radius less than 1 (a sufficient condition for stability) if and only in the Hessian is negative definite (a sufficient condition for local maximality). Both constraints on the matrices are independent of the choice of local coordinates (that is, they are invariant under similarity transformations), however to make the connection between the Jacobian and the Hessian apparent we will have to choose the local coordinates very carefully. A further technical complication is that the tree recursions are in the projective space and that the optimization of Ψ 1 is constrained.
We give a high level overview of the Jacobian; the proofs for the ∆-regular case follow the same reasoning as for the bipartite ∆-regular case, see [17, Section 4.2.2], after simply changing C j 's to R j 's and β j 's to α j 's. Assume that R 1 , . . . , R q is a fixpoint of the tree recursions. Now we consider an infinitesimal perturbation of the fixpoint R 1 + εR ′ 1 , . . . , R q + εR ′ q and see how it is mapped by the tree recursions. Let on the fixpoint. The tree recursions map (in the projective space) the perturbation as follows: wherer i 's are given by the following linear transformation and where the r i 's are required to satisfy The condition (28) is invariant under the map (27) and corresponds to choosing the representative of R 1 , . . . , R q with i j B ij R i R j = 1. Next we give a high level description of the Hessian; again, this is almost identical to the one in [17, Section 4.2.1] after identifying C j 's with R j 's and β j 's with α j 's. Recall that Ψ 1 is a function of α 1 , . . . , α q . There is an alternative parameterization of Ψ 1 : instead of α 1 , . . . , α q (restricted to α i = 1) we use R 1 , . . . , R q (restricted to i j B ij R i R j = 1) and use the following Every α can be achieved using parameterization by R. Let α 1 , . . . , α q be a critical point of Ψ 1 and let R 1 , . . . , R q satisfy (29). We are going to evaluate Ψ 1 in a small neighborhood around α 1 , . . . , α q . It is equivalent (and easier to understand) to perturb the R 1 , . . . , R q to R 1 + εR ′ 1 , . . . , R q + εR ′ q and evaluate at the point given by (29). Again, the correct parameterization is to take This yields the following expression for the value of Ψ 1 at the perturbed point Note that there is no linear term, since we are at a critical point. Recall that the α i have to satisfy i α i = 1 which corresponds to the restriction (28). Now we are ready to prove Theorem 8. Let L be a linear map such that the Jacobian of the map r →r represented by (27) Finally let S be the linear subspace defined by (28).
Proof of Theorem 8. We will use the correspondence between fixpoints of the tree recursions and critical points of Ψ 1 given by Lemma 9. The constraint for the fixpoint to be Jacobian attractive is that (∆ − 1)L on S has spectral radius less than 1, see equation (26). The constraint for the critical point to be Hessian maximum is that the eigenvalues of (I + L)((∆ − 1)L − I) on S are negative, see equation (30).
Note that L is symmetric and if B is positive semidefinite then L is positive semidefinite (since L is congruent to B; L is obtained by multiplying B by a diagonal matrix on the left and on the right). Hence L has non-negative real spectrum. Note that S is invariant under L and hence the spectrum of L on S is a subset of the spectrum of L (it is still non-negative real; the restriction wiped out the eigenvalue 1).
The constraint for the fixpoint to be Jacobian attractive, in terms of eigenvalues, is: for each eigenvalue x of L on S (∆ − 1)|x| < 1.
The constraint for the critical point to be Hessian maximum, in terms of eigenvalues, is: for each eigenvalue Note that conditions (31) and (32) are equivalent (since x ≥ 0).

Remark 3.
For antiferromagnetic models every critical point of Ψ 1 is a local maximum. Indeed, we need only to prove that equation (32) is satisfied for every critical point. The matrix L has non-negative entries hence 1 is the largest eigenvalue and all the other eigenvalues have magnitude less than 1 (since B is ergodic). Moreover the matrix L has the same signature as B (since they are congruent) and hence the eigenvalues other than 1 are negative. These 2 facts imply (32).

Remark 4.
Note that one direction of the implication in Theorem 8, namely, that a Jacobian attractive fixpoint is Hessian local maximum, holds for every model (without the ferromagnetism assumption), since (31) always implies (32). However, for the reverse implication, the ferromagnetic assumption is essential. For example, in an antiferromagnetic model, by Remark 3, every critical point is a local maximum. For the antiferromagnetic Potts model, the only critical point is the uniform vector and hence it is always a local maximum for every value of B. On the other hand, it is straightforward to check that for the antiferromagnetic Potts model the uniform fixpoint is Jacobian unstable when B < ∆−q ∆ .

Small Subgraph Conditioning Method
By Theorem 6, we have that for the random variable Z α G , when α is a global maximizer of Ψ 1 , the exponential order of its second moment is twice the exponential order of its first moment. This is not sufficient however to obtain high probability results, since it turns out that, in the limit n → ∞, the ratio of the second moment to the square of the first moment converges to a constant greater than 1. Hence, the second moment method fails to give statements that hold with high probability over a uniform random ∆-regular graph. More specifically, to obtain our results we need sharp lower bounds on the partition function which hold for almost all ∆-regular graphs. In the setting we described, the second moment method only implies the existence of a graph which satisfies the desired bounds and even there in a not sufficiently strong form.
For random ∆-regular graph ensembles, the standard way to circumvent this failure is to use the small subgraph conditioning method of Robinson and Wormald [42]. While the method is quite technical, its application is relatively streamlined when employed in the right framework. The method was first used for the analysis of spin systems in the work of [40] for the hard-core model and subsequently in [43], [16]. In [17], we extended the approach to q-spin models for all q ≥ 2, where the major technical obstacle was the computation of certain determinants which arise in the computation of the moments' asymptotics. While the arguments there are for random bipartite ∆-regular graphs, the approach extends in a straightforward manner to random ∆-regular graphs.
We defer the details of the application of the method in the present setting to Section 11.1. We state here the following lemma which is the final outcome of the method.

Proof of Theorem 7
Using Lemma 11, the proof of Theorem 7 is straightforward.
Proof of Theorem 7. Let α be a Hessian dominant phase, whose existence is guaranteed by the assumptions. By Lemma 11, with probability 1 − o(1) over the choice of the graph, we have . Moreover, since the model is ferromagnetic, for ∆-regular graphs G with n vertices, 1 n log Z G ≥ C for some constant C > −∞ (explicitly, one can take C := ∆ 2 log max i∈[q] B ii , see the remarks after Theorem 7). We thus obtain lim inf By Jensen's inequality, we also have lim sup All that remains to show is that 1 Note the factor exp(o(n)), which is there to account for dominant phases which are not Hessian.
This concludes the proof.

Phase Diagram for the Ferromagnetic Potts model
In this section we prove Theorem 4 detailing the phase diagram for the Potts model. To prove Theorem 4, we will use Theorem 8 and the results of Section 6. We briefly overview the approach. In order to determine the local maxima for the Potts model (or, more generally, for any ferromagnetic model), we need to compute the spectral radius of the map L : (r 1 , . . . , r q ) → ( r 1 , . . . , r q ), given by 3 where the R i 's specify a fixpoint of the tree recursions (6) and the α i 's are given by Our goal is to determine the local maxima by verifying when the spectral radius of this map (in the subspace (34)) is less than 1/(∆ − 1). Let be the matrix of the linear map L. Note that M is symmetric and has an eigenvalue equal to 1 with eigenvector e = √ α 1 , . . . , √ α q ⊺ . It follows that the eigenvalues of L in the subspace (34) are precisely the eigenvalues different from 1 of the matrix M.
To proceed, we need to restrict our attention to the ferromagnetic Potts model. First we argue that the fixpoints of the tree recursions (6) in the case of the ferromagnetic Potts model are simple-they are supported on only two values. Proof. W.l.o.g. we may assume that the implicit constant in (6) is 1. Let Proof. This follows from Lemma 12.
Remark 5. Two settings for t in the setting of Lemma 13 will be of particular interest, namely t = 1 and t = q. We shall refer to the latter as the uniform fixpoint, and this corresponds to the disordered phase. We shall refer to fixpoints with t = 1 as the "majority" fixpoints. This class includes either one or two (depending on the value of B, c.f. Lemma 17) fixpoints where color 1 dominates and the remaining appear with equal probability. The ordered phases correspond to the majority fixpoint for which the ratio R 1 /R q is maximum (cf. the upcoming Lemma 18). Remark 6. We note here that fixpoints with t = q exist iff B ≥ B u . In Lemma 17, we only show one side of this equivalence. In particular, we show that B ≥ B u implies the existence of the majority fixpoints, and this turns out to be the only "existential" fact that is needed for the proof of Theorem 4. More precisely, in Lemmas 14 and 16, we show that fixpoints with t = 1, q are not attractive which in turn implies that the corresponding phases are not local maxima of Ψ 1 and, thus, are not dominant as well.
Lemma 13 implies that in the case of the ferromagnetic Potts model, M has a very simple structure. The following simple lemma describes the eigenvalues of M. Lemma 14. In the setting of Lemma 13, M has the following eigenvalues for 1 ≤ t < q: For t = q the eigenvalues of M are • 1 with multiplicity 1, Proof. We already described the eigenvector for eigenvalue 1. For every i such that 2 ≤ i ≤ t, a vector with 1 at position 1 and −1 at a position i (and zeros elsewhere) yields eigenvalue (B − 1)R 2 1 /α 1 . Similarly, for every i such that t + 1 ≤ i < q, a vector with 1 at position q and −1 at position i (and zeros elsewhere) yields eigenvalue (B − 1)R 2 q /α q . Note that in the case t = q this accounts for all the eigenvalues. In the case t < q we deduce the remaining eigenvalue by considering the trace of M: Proof. The uniform fixpoint of the tree recursions corresponds to R 1 = · · · = R q and hence α 1 = · · · = α q = (B + q − 1)R 2 1 . By Lemma 14, the only relevant eigenvalue is (B − 1)/(B + q − 1) (with multiplicity q − 1), which we compare with 1/(∆ − 1) to obtain the lemma.
Lemma 15 allows us to restrict our focus on q − 1 ≥ t ≥ 1. In this setting, the tree recursions, with x := y d := R 1 Rq and d := ∆ − 1, yield: The following lemma implies that all fixpoints with q − 1 ≥ t ≥ 2 are unstable in the whole non-uniqueness regime, since the respective matrices M have an eigenvalue greater than 1/(∆ − 1).
which after simple manipulations reduces into Substituting R 1 Rq = y d and B − 1 from equation (35), the inequality becomes Doing the necessary simplifications, we obtain the following equivalent inequality Since y > 1, the only non-trivial factor to prove positivity is p(y) := (d − 1)y d+1 − dy d + y. By the Descartes' rule of signs, p(y) can have at most two positive roots. It holds that p(1) = p ′ (1) = 0, so that p(y) is always positive for y > 1.
In light of Lemmas 14 and 16, it remains to classify fixpoints with t = 1, i.e., the majority fixpoints. The following lemma gives the number of the majority fixpoints in the regimes of interest.
Lemma 17. When B u < B < B rc , there are exactly two distinct majority fixpoints. When B ≥ B rc , there is exactly one majority fixpoint.
Remark 7. At B = B u , it follows from the proof of Lemma 17 that there is a unique majority fixpoint which "bifurcates" in the regime B u < B < B rc .
Proof of Lemma 17. We need to look at (35) for t = 1 and check how many values of y > 1 satisfy the equation in the two regimes B u < B < B rc and B ≥ B rc . For t = 1, equation (35) reads as where p(y) is the polynomial Employing the Descartes' rule of signs we see that p(y) has one or three positive roots counted by multiplicities. It is easy to check that p(1) = p ′ (1) = 0, so that p has in fact 3 positive roots counted by multiplicities (since 1 is a double root), let ρ denote the other positive root. We next prove that ρ > 1 so that p(y) ≤ 0 if 1 ≤ y ≤ ρ and p(y) ≥ 0 if y ≥ ρ. It follows that for positive y we have p(y) > 0 iff y > ρ.
From the above, it follows that f (ρ) = min y≥1 {f (y)}. Observe also that f (y) → ∞ as y → ∞, while f (y) → q d−1 as y ↓ 1. Thus, when y ↓ 1, we have that B ↑ B rc (viewing B as a function of y, see (36)).
To obtain the lemma, it thus suffices to show that . We reparameterize z → 1/z, so that B u is the unique value of B for which the following polynomial has a double root in (1, ∞): Let z c be the double root of this polynomial when B = B u . Solving each of r(z c ) = 0 and r ′ (z c ) = 0 with respect to B and equating the expressions, we obtain that p(z c ) = 0. It follows that for B = B u the double root of the polynomial r(z) is equal to ρ. Now, solving r(ρ) = 0 with respect to B gives us that B u = f (ρ) + 1, as wanted.
We can now classify the stability of fixpoints with t = 1.
Lemma 18. For B > B u , exactly one majority fixpoint is Jacobian attractive. More precisely, the only Jacobian attractive fixpoint with t = 1 is the one maximizing x = R 1 /R q (among the solutions of (35) for t = 1).
We can now explicitly specify the marginal a in the definition of the ordered phase stated in the Introduction. With x as in Lemma 18, we apply (20), which yields: Next, we give the proof of Lemma 18.
From these two Items, the lemma follows. For Item 1, let W := λ 2 − λ 1 . Then, expanding everything out, we obtain that and thus W > 0 (since B > 1 and R 1 > R q ).
For Item 2, let Q := λ 2 − 1 ∆−1 . Expanding everything out, we have Thus, to check whether Substituting x = y d and B − 1 from (35), we obtain the equivalent inequality where p(y) is the polynomial defined in (37). By the proof of Lemma 17, p(y) > 0 iff y > ρ. The proof of Lemma 17 further yields that the latter inequality, throughout the regime B > B u , is only satisfied by the majority fixpoint with x maximum, thus yielding Item 2. This concludes the proof.
Having classified the fixpoints which are Jacobian attractive, we now need to see when these are dominant. This entails comparing the values of Ψ 1 for the respective phases. Rather than doing this directly, we use Lemma 9. In particular, it is equivalent to compare the values of Φ 1 at the fixpoints. Moreover, note that the expression (15) is invariant upon scaling R i 's by the same factor and hence we only need to compare Φ 1 (x, 1, . . . , 1) and Φ 1 (1, . . . , 1), where x is a solution of (35) for t = 1.
Proof. By a direct calculation Using the substitutions d = ∆ − 1, x = y d and the second equation in (35), after careful manipulations we obtain It is straightforward to check that for y = (q − 1) 2/(d+1) , DIF = 0. To find the respective value of B for this value of y, we just need to plug the value y = (q − 1) 2/(d+1) in the second equation in (35) for t = 1. In particular, from (35) we have that For y = (q − 1) 2/(d+1) , we have and

It follows that
To prove the lemma, it thus remains to show that y is an increasing function of B and DIF increases as y increases. This is indeed true. Using (35), one calculates (see the relevant (36)) where p(y) is the polynomial defined in (37), whose positivity has already been established for all y > ρ, see the proof of Lemma 17. The claim follows.

#BIS-hardness for Potts
We first give a rough description of our reduction. We will construct a gadget G which is a balanced, bipartite graph on (2 + o(1))n vertices. There will be n ′ = O(n 1/8 ) vertices on each side of G which will have degree ∆ − 1, the remainder have degree ∆. The key is that G behaves similarly to a random bipartite ∆-regular graph. Hence, for the ferromagnetic Potts model, when B > B o , the q ordered phases will dominate (ferromagnetic models on random bipartite ∆-regular graphs have the same dominant phases as on random ∆-regular graphs, see footnote 6). We will take an instance H for #FerroPotts(q, B, ∆) where H has n ′ vertices. We then replace each vertex in H by a gadget G. Then we will use the degree ∆ − 1 vertices in these gadgets to encode the edges of H, while preserving bipartiteness. The resulting graph H G will have bounded degree ∆ and the Potts model on H G will "simulate" the Potts model on H.
The gadget G is identical to the one used by Sly [43]. Before giving the detailed description of the gadget, it would be instructive to explain at an intuitive level the basic construction of the gadget and the properties we are trying to ensure. Roughly, the gadget is a random bipartite graph G with (∆ − 1)-ary trees of depth Θ(log n) attached on a small set of vertices on each side of G. We use the random bipartite graph to ensure that the final gadget has q ordered phases when B > B o , which will be used to encode the q spins of the Potts model in the graph H. The trees will ensure, roughly, that the relative error we introduce by approximating the partition function in H with the partition function in H G is polynomially small. To explain this further, the reader should keep in mind that the analysis of the Potts model on the random graph G will only give estimates of its partition function within a multiplicative factor (1 ± ε) for some small constant ε > 0. In turn, using this bound to analyze the Potts model on H G would result in estimating the partition function of H within a multiplicative factor (1 ± ε) |H| , which is way bigger than the polynomial accuracy we seek in approximation-preserving reductions. This obstacle is precisely the reason why the trees are attached to G: the trees will boost the constant ε to a much smaller quantity of order n −O(1) ; and then, the final approximation can be made polynomially small as desired. 4 As in [43,16,7], this boosting is possible due to the fact that the ordered phases correspond to non-reconstructible Gibbs measures on the infinite ∆-regular tree (we will expand later on this in the proof of Lemma 20 Item 2).
Next, we give the description of the gadget G. The gadget G is defined by two parameters θ, ψ where 0 < θ, ψ < 1/8. The construction of the gadget G has two parts. First construct the following bipartite graph G with vertex set V + ∪ V − . For s ∈ {+, −}, |V s | = n + m ′ where m ′ will be defined precisely later. Take ∆ random perfect matchings between V + and V − . Then remove a matching of size m ′ from one of the ∆ matchings. Call this graph G. For later use, let U := U + ∪ U − denote the vertices of degree ∆ in G and W := W + ∪ W − denote the vertices of degree ∆ − 1 in G.
In the second stage, for each side of G, partition the degree ∆ − 1 vertices into n θ equal sized sets and attach to each set a (∆ − 1)-ary tree of depth ℓ where ℓ = ⌊ψ log ∆−1 n⌋. (Use the vertices of G as the leaves of these trees.) Hence each side contains n θ trees of size O(n ψ ). (More precisely, (∆ − 1) ⌊θ log ∆−1 n⌋ trees, each having (∆ − 1) ⌊ψ log ∆−1 n⌋ leaves.) This defines the gadget G. For s ∈ {+, −}, let R s denote the roots of the trees on side s and R := R + ∪ R − . Note that each vertex in R has degree ∆ − 1 and these will be used to encode the edges of H (we give the reduction explicitly just after stating the relevant gadget lemma, Lemma 20). Note that 4 Let us remark that sometimes even the (1 ± ε) factor estimates are sufficient to get strong inapproximability results and the attachment of the trees is not needed, see for example the reductions in [44,17]. The difference in those settings is that the corresponding counting problems, i.e., counting independent sets or counting colorings, can be connected to NP-hard problems (such as Max-Cut), yielding that it is NP-hard to approximate the corresponding partition function even within an exponential factor. In contrast, for problems like #BIS or #FerroPotts (which correspond to easy decision problems, e.g., finding the max independent set on a bipartite graph or the maximum weight configuration in the ferromagnetic Potts model), such strong inapproximability results are not known; in fact, we only have evidence that an FPRAS is unlikely to exist. This necessitates the study of approximation-preserving reductions in our setting (as in, e.g., [14,7,33]) and thus our quest for the polynomial precision. m ′ = (∆ − 1) ⌊θ log ∆−1 n⌋+⌊ψ log ∆−1 n⌋ and m ′ = o(n 1/4 ).
Denote by G = (V, E) the final graph. Recall, for a configuration σ ∈ Ω, the set of vertices assigned spin i is denoted by σ −1 (i). The phase of a configuration σ : V → [q] is defined as the dominant spin among vertices in U = U + ∪ U − (the vertices of degree ∆ in G): where ties are broken with an arbitrary deterministic criterion (e.g., the lowest index).
The gadget G behaves like a random bipartite ∆-regular graph because m ′ ≪ n, as we will detail in the upcoming Lemma 20. Hence, since B > B o , Theorem 4 implies that the q ordered phases are dominant. Therefore, we will get that for a sample σ from the Gibbs distribution, the phase of σ will be (close to) uniformly distributed over these q ordered phases. Let phase i refer to the ordered phase where spin i is the majority. Once we condition on the phase for the vertices in U , say it is phase i, then each of the roots of the trees appended to G, roughly independently, will have spin i with probability ≈ p and spin j = i with probability ≈ (1 − p)/(q − 1) where p is the probability that the root of the infinite (∆ − 1)-ary tree has spin i in the Gibbs measure corresponding to the ordered phase i. 5 Hence, for each of the q possible phases, we define the following product distribution on the configurations σ R : For future use, one can define completely analogously the product measure Q i W (·) on configurations σ W : W → [q] (recall that W is the set of vertices with degree ∆ − 1 in G).
The following lemma is proved using methods in [43] and its proof is given in Section 9.1. Roughly, the first item in the lemma follows from the symmetries of the Potts model. For the second item, the rough idea is that when the phase is i, the marginal spin distribution of vertices in W in the graph G is close to Q i W . The purpose of the trees is to boost this effect; more precisely, make the distance between the marginal spin distribution of vertices in R and Q i R an inverse polynomial factor (see Item 2 in Lemma 20). In turn, the reason that the trees can accomodate the "boosting" is that the marginal distribution on W corresponds to an extremal Gibbs measure on the tree, which results in the spins of the roots of the trees being strongly concentrated.
Lemma 20. For every q, ∆ ≥ 3 and B > B o , there exist constants θ, ψ > 0 such that the graph G satisfies the following with probability 1 − o(1) over the choice of the graph: 1. The phases occur with roughly equal probability, so that for every phase i ∈ [q], we have 2. Conditioned on the phase i, the spins of vertices in R are approximately independent, that is, Alternatively, p = x/(x + q − 1), for the same x as in (38).
With Lemma 20 at hand, we can now formally state the reduction that we sketched earlier. Let B > B o . Let H be a graph on n ′ vertices, where n ′ ≤ n θ/4 and θ is as in Lemma 20. Assuming an FPRAS for the ferromagnetic Potts model on max-degree ∆ graphs and parameter B, we will show that we can approximate Z H (B * ), the partition function of H in the ferromagnetic Potts model with parameter B * , where B * will be determined shortly.
To do this, we first construct a graph H G . First, take |H| disconnected copies of the gadget G in Lemma 20 and identify each copy with a vertex v ∈ H. Denote byĤ G the resulting graph, G v the copy of the gadget associated to the vertex v in H and by R + v , R − v , R v the images of R + , R − , R in the gadget G v , respectively. We next add the edges of H inĤ G . To do this, fix an arbitrary orientation of the edges of H. For each oriented edge (u, v) of H, we add an edge between one vertex in R + u and one vertex in R − v , using mutually distinct vertices for distinct edges of H. The resulting graph will be denoted by H G . Note that H G is bipartite and has maximum degree ∆.
For a graph H and activity B ≥ 1, recall that Z H (B) is the partition function for the ferromagnetic Potts model at activity B on the graph H. We have the following connection: There exists B * > 1 (depending only on q, ∆, B) such that the following holds for every graph H with n ′ vertices: Using Lemma 21 we can now prove that for all ∆ ≥ 3, all B > B o , it is #BIS-hard to approximate the partition function of the ferromagnetic Potts model on bipartite graphs of maximum degree ∆.
Proof of Theorem 2. Goldberg and Jerrum [21] showed that for every B > 1 it is #BIS-hard to approximate the partition function of the ferromagnetic Potts model on all graphs. Fix ∆, q ≥ 3 and B > B o for which we intend to prove Theorem 2, and let B * = B * (q, ∆, B) > 1 be specified as in Lemma 21. We first show that an FPRAS for approximating the partition function with activity B on graphs with maximum degree ∆ implies an FPRAS for approximating the partition function with activity B * on all graphs. It will then be clear that our reduction is in fact approximationpreserving and hence the theorem will be proven.
Suppose that there exists an FPRAS for approximating the partition function with activity B on bipartite graphs with maximum degree ∆. Take an input instance H for which we would like to estimate the partition function of the Potts model at activity B * . First generate a random gadget G using the construction defined earlier. This graph G satisfies the properties in Lemma 20 with probability 1 − o(1). Approximate the partition function of G at activity B within a multiplicative factor 1 ± ε/10n ′ using our presumed FPRAS (where, recall, n ′ is the number of vertices in H). Also, using the presumed FPRAS approximate the partition function of H G at activity B within a multiplicative factor 1 ± ε/5. The bounds for Z H (B * ) in Lemma 21 are then within a factor 1 ± ε for sufficiently large n, implying an FPRAS for approximating the partition function at activity B * . This, together with the result of [21], implies an FPRAS for counting independent sets in bipartite graphs.
Proof of Lemma 21. Recall thatĤ G are the disconnected copies of the gadgets, as defined in the construction of H G . Note, ZĤ G (B) = Z G (B) n ′ . Hence to prove the lemma it suffices to analyze For a configuration σ on H G , for each v ∈ H, let Y v (σ) denote the phase of σ on G v . Denote the vector of these phases by Y(σ) = (Y v (σ)) v∈H ∈ [q] H , we refer to Y(σ) as the phase vector for σ.
For U ∈ [q] H , let Ω U denote the set of configurations σ on H G where Y(σ) = U . Let Z H G (U ) be the partition function of H G restricted to configurations σ ∈ Ω U , that is, where for a configuration σ, m(σ) is the number of monochromatic edges under σ. We may view U as an assignment V (H) → [q] where V (H) are the vertices in the graph H. Hence, we can consider the number of monochromatic edges in the graph H under the assignment U , which we denote by m(U ). Recall the goal is to analyze To this end we will analyze ZĤ G (U ) for every U and then we will use that every U is (close to) equally likely inĤ G which will follow from Property 1 in Lemma 20.
Denote by R H the set of vertices ∪ v R v , i.e., the union of all the vertices of degree ∆ − 1 inĤ G . Notice that once we fix an assignment to all of the vertices in R H , by the Markov property of the model, we have that sinceĤ G is a union of disconnected copies of G and in each copy of G we have Property 2 of Lemma 20. It follows that where A (resp. D) is the expected weight of an edge connecting two gadgets which have the same (resp. different) phases. Simple calculations show that Letting B * = A/D and C H = D |E(H)| , we obtain Property 1 in Lemma 20 gives that for every U it holds that We also have Using the estimates (42), (43) in (44), we obtain The result follows after observing that ZĤ G (B) = Z G (B) n ′ and rearranging the inequality.

Proving the properties of the gadget
In this section, we prove the properties of the gadget we use, as stated in Lemma 20. We outline the proof and introduce the relevant notation. The proof follows the same approach as in [43, Theorem 2.1] and uses non-reconstruction results in [35]. We argue however more thoroughly for Item 1 in Lemma 20, since in [43] a cruder bound for the probability that a phase appears was sufficient. In our case, the more delicate bound will follow from the symmetries of the Potts model. We first illustrate how symmetry comes into play. Let Σ i G be the set of configurations on G which have phase i, i.e., Σ i consists of these configurations whose phase was determined by breaking a tie. We first show that Item 1 in Lemma 20 will follow from showing that Σ o G has exponentially smaller contribution to the partition function of G than Σ i G for every i ∈ [q]. To capture this, for a subset Σ ⊆ Ω G of the configuration space, denote by Z G (Σ) the partition function restricted to configurations in Σ, that is, Let π be a permutation of the colors [q] which maps color i to color j. For a configuration σ, we denote by π(σ) the configuration π • σ. Clearly, for every configuration to get the inequality in Item 1 of Lemma 20 it suffices to show that Z G (Σ o G ) is smaller than Z G by a sufficiently large polynomial factor.
We briefly outline the argument for proving that Z G (Σ o G ) is smaller than Z G by a sufficiently large polynomial factor, introducing at the same time some relevant notation. First, note that the definition of the phase of a configuration makes sense for configurations on G as well. For convenience, we will henceforth use Z o G , Z i G as shorthands for Z G (Σ o G ), Z G (Σ i G ) and Z o G , Z i G for their analogues in G. Roughly, we will first show that Z o G is exponentially smaller than Z i G with probability 1 − o(1) (over the choice of G). This part follows from the fact that the q ordered phases are dominant and the fact that Z i G matches its expectation (up to a polynomial factor) with probability 1 − o(1) (this is the analogue of Lemma 11 for the graph distribution induced by G). We will then show that Z o G is smaller than Z i G by a factor of exp(n 1/4 ) by crudely accounting for the contribution of the trees attached in the second step of the construction of G. Summing over i ∈ [q] then yields the desired bound and thus completes the "symmetry" argument.
To formalize the outline in the previous paragraph, we will have to capture how the partition functions Z G and Z G interplay. Due to the Markov property, this happens only through vertices in W (recall, this is the set of vertices of degree ∆ − 1 in the graph G on which the trees are attached). Thus, we will partition the sets Σ o G , Σ i G according to the configuration η on W . In particular, Σ o G (η) will be those configurations σ in Σ o G such that σ W = η and Z o G (η) will be the contribution to the partition function of G from configurations in Σ o G (η). Define similarly Σ i G (η) and Z G (η). We need a final piece of notation. Let J be the union of the trees appended in the second step of the construction of the gadget G. Note that the only vertices of G included in J are vertices in W . Let Z J (η) be the contribution to the partition function of J from configurations σ (on J) such that σ W = η. We are now able to put these definitions into work. In particular, we have that We will need the following lemma, which is proved combining techniques from [43], [ (ii) For all sufficiently small ε > 0 and sufficiently large n, for i ∈ [q], Proof of Lemma 22. The first equality in (45)  for the explicit derivation in the hard-core model which is straightforward to adapt to the present setting as well). Combining the above yields Item (i). Item (ii) is a consequence of the fact that for some small ε ′ > 0, for each ordered phase i there exists an ε ′ -ball around it consisting solely of configurations which are contained in Σ i G \Σ o G . Since the ordered phase i is dominant, a standard compactness argument (see for example the upcoming proof of Lemma 27) yields that 1 n log E G Z o G is strictly less than 1 n log E G Z i G . Finally, Item (iii) follows from the small subgraph conditioning method in [17, Appendix A] (see also Theorem 28).
We conclude this section by giving the proof of Lemma 20.
Proof of Lemma 20. To get Item 1, by the symmetry argument described in the beginning of the section, it suffices to show that for every i ∈ [q] it holds that Z o G ≤ exp(−n 1/4 )Z i G with probability 1 − o(1) over the choice of the graph G. We use Lemma 22. In particular, Markov's inequality yields 6 Dominant phases on random bipartite ∆-regular graphs correspond to the global maximizers of maxR,C R ⊺ BC R p C p where p = ∆/(∆ − 1), see [17,Theorem 4.1]. As a consequence of Observation 1, for a ferromagnetic model, any such maximum must satisfy R = C (up to a scaling factor). This yields that the maximizers are in one-to-one correspondence with the maximizers of the r.h.s of (16).
Item 2 of the lemma follows exactly the approach in [43]. The required non-reconstruction results to push the approach in [43] are given in [35, Proof of Theorem 1.4] (ferromagnetic Potts model on the tree with constant boundary condition). Together with Lemma 22, the proof of [43, Theorem 2.1] extends almost verbatim to our case as well. We briefly outline the main ideas of the proof as carried out in [43].
Recall that the goal is to show that, conditioned on the phase i, the distribution of the spins in vertices in R is close to Q i R (·). For i ∈ [q], let (analogously to [43]) i.e., B i is the set of "bad" configurations on W which exert large influence on vertices in R. Note that, while we defined B i using the Gibbs distribution of the graph G, we could have used instead the Gibbs distribution of J, since, by the Markov property, conditioned on the spins of vertices in W , the spins of the vertices in J are conditionally independent from the rest of the vertices in the graph G. It follows that Back to the proof, the result will follow from µ Using analogous inequalities to those we used to prove Item 1, it can be proved that where the measure ν i is defined on the space of all configurations η : W → [q] given by Our goal is thus to show that It is useful to note at this point that the bound in (55) is a property of the trees and, in particular, does not depend on the Gibbs distribution of the (random) graph G. (Indeed, ν i is specified by the Gibbs distribution on the graph J, which is a disjoint union of (∆ − 1)-ary trees, and the product measure Q i (η). Also, B i is specified by the Gibbs distribution of the graph J, see (52).) Also, since J is the disjoint union of a polynomial number of identical trees, by a union bound, it suffices to show (55) when J consists of a single (∆ − 1)-ary tree with height ℓ = Θ(log n) and W denoting the leaves of the tree. In turn, this will follow from the following doubly exponential upper bound where C > 0 is a constant. (To recover (55) from (56), we just need to tune the parameter θ of the gadget to ensure that the trees have sufficiently large height ℓ relative to θ; recall that ℓ = ⌊ψ log ∆−1 n⌋, so we can choose any constant θ so that 0 < θ < Cψ 2 ln(∆−1) .) We will conclude this proof by sketching the main idea behind the strong bound in (56), a detailed proof of the bound with all the relevant connections can be found in Appendix A. The bound in (56) goes back to the works of Martinelli, Sinclair and Weitz [34,35] who studied the mixing time of Glauber dynamics on trees with boundary conditions and was first used for the construction of gadgets by Sly [43]. A key idea, captured in [43, Proof of Lemma 4.3], is that the measures ν i on configurations η : W → [q] can be viewed as projections of Gibbs measures corresponding to the ordered phases on the infinite (∆ − 1)-ary tree. The Gibbs measure corresponding to the ordered phase i can be obtained by taking the weak limit of the Potts distribution of a finite tree with depth ℓ whose leaves are conditioned to have spin i as ℓ → ∞. In Appendix A.2, we give an alternative Markov chain construction of these Gibss measures using a broadcasting process which is more convenient to work with. These Gibbs measures are well-known to be extremal; or, equivalently, that the broadcasting process has the non-reconstruction property, which roughly says that the spin of the root can not be reconstructed from a typical configuration on the leaves of the tree (asymptotically in ℓ), see Appendix A.1 for details that are relevant in our setting and the survey [38] for more details on broadcasting processes on trees. Then, the techniques of [34,35] further show that a certain eigenvalue condition of the relevant broadcasting matrix allows to quantify the dependence on ℓ and thus obtain the bound in (56). In Appendix A, we use a similar-flavored result from Sly and Zhang [46] which we can apply more directly in our setting.
This concludes the proof of Lemma 20.

#BIS-Hardness for Bipartite Colorings
Using our #BIS-hardness result for the ferromagnetic Potts model on bounded-degree graphs, we now prove our #BIS-hardness result for colorings on bounded-degree bipartite graphs (Corollary 3). The reduction between these two problems was first observed in [14]; here, we just have to work out the bound on k that the application of Theorem 2 yields.
Proof of Corollary 3. We will show that for all integer k, ∆ ≥ 3, it holds that and that, whenever k ≤ ∆/(2 ln ∆), it holds that The corollary will then follow from Theorem 2.
To prove (57), let G = (V, E) be an input graph to the problem #BipFerroPotts(k, B, ∆) with B = (k − 1)/(k − 2) > 1. Construct an instance G ′ of #BipColorings(k, ∆) by subdividing each edge of G, i.e., G ′ is a graph with vertex set V ′ = V ∪ E and edge set E ′ = e=(u,v)∈E {(u, e), (e, v)}. It is clear that G ′ is bipartite and every vertex has degree at most ∆.
We claim that the partition function for the k-state ferromagnetic Potts model on G with B = (k − 1)/(k − 2) is equal to the number of proper k-colorings on G ′ (times an easily computable factor equal to (k − 2) |E| ).
To see this, for a k-coloring σ ′ of G ′ , map σ ′ to a configuration σ on G given by the restriction of σ ′ to vertices in V . Let σ : V → [k] be any configuration of the Potts model on G. The claim will follow by showing that the number of colorings of G ′ which map to σ is given by where m(σ) denotes the number of monochromatic edges in G under σ. Indeed, for a monochromatic edge e = (u, v) ∈ E under σ, there are k − 1 ways to choose the color of the vertex e ∈ V ′ in the graph G ′ . In contrast, if e = (u, v) ∈ E is not monochromatic under σ, there are k − 2 ways to choose the color of the vertex e ∈ V ′ in the graph G ′ . This completes the proof of (57).
We next show (58). For ∆ ≤ 16, we have k ≤ ∆ 2 ln ∆ < 3, so we may assume that ∆ ≥ 17 (otherwise, there is nothing to prove). We first reduce (58) to the case k = ∆ 2 ln ∆ . Let d := ∆ − 1, Fix d ≥ 16. The polynomial h d (z) has two change of signs, so by the Descarte's rule of signs, it has at most two positive roots. Clearly, z = 1 is a root of h d (z) and since h ′ d (1) > 0, there is one more root z d > 1. Thus, h d (z o ) > 0 iff z o < z d (note that z o > 1). Thus, to show (58), it suffices to consider the case where k = ∆/(2 ln ∆). Now, we prove the desired inequality for k = ∆/(2 ln ∆). The inequality is equivalent to ln ∆ ), so the inequality will follow from which holds for k = ∆/(2 ln ∆) and all ∆ ≥ 17, as wanted. This completes the proof of (58). This concludes the proof of Corollary 3.

Torpid mixing of Swendsen-Wang
In this section, we prove Theorem 5 about torpid mixing of the Swendsen-Wang algorithm at the critical activity B = B o . More precisely, we will show that, with probability 1−o(1) over the choice of a random ∆-regular graph with n vertices, the mixing time of the Swendsen-Wang algorithm is exponential in n. We will exploit Theorem 4 for B = B o , which in combination with Lemma 11, essentially implies that for this value of B, we have coexistence of the ordered and disordered phases in a random ∆-regular graph (with probability 1 − o(1)). Denote by u the disordered phase and by m 1 , . . . , m q the q ordered phases of Theorem 4. Note that the q ordered phases are identical up to a permutations of the colors and for the purposes of this section we can treat them in a uniform manner. Thus, denote m = {m 1 , . . . , m q }. We will say that a configuration σ is close to u (resp. m) if the color frequencies in σ are close to those prescribed by u (resp. one of m 1 , . . . , m q ).
Lemma 11 implies that, with probability 1 − o(1) over the choice of the graph, the set of configurations near u and m dominate the Gibbs distribution, in the sense that these sets each have measure ≥ 1/poly(n) and the rest of the configurations have exponentially smaller mass. To analyze the Swendsen-Wang algorithm we need a more refined picture which includes the number of monochromatic edges in such configurations. To this end, we define the following quantities which roughly correspond to the expected number of monochromatic edges for configurations in these two sets scaled by a factor of n (see the upcoming equation (64) and the remarks thereafter for the derivations). Hence, let , and E u : where x, defined in Lemma 19, is a solution of the normalized tree recursions. Now we can define the set of configurations with vertex marginals close to u and m and edge marginals close to nE u and nE m , respectively. For a configuration σ ∈ Ω, let e G (σ) denote the number of monochromatic edges in G under the spin configuration σ. Recall σ −1 (i) is the set of vertices with spin i in σ. Let c(σ) denote the vector (|σ −1 (1)|/n, . . . , |σ −1 (q)|/n). For ε > 0, let (60) The following lemma is proved in Section 10.1 and is the main tool to obtain our torpid mixing results.
For all sufficiently small ε > 0, there exists C > 0, such that with probability 1 − o(1) over the choice of the graph G ∼ G(n, ∆), it holds that For the rest of this section, we fix a graph G whose Gibbs distribution satisfies (63). By Lemma 23, this holds for asymptotically almost all ∆-regular graphs. Now to prove that the chain is torpidly mixing we will bound its conductance defined as Φ SW = min S;∅⊂S⊂Ω Φ SW (S) where Φ SW (S) = σ∈S µ(σ)P (σ, S) µ(S)µ(S) , where P denotes the transition matrix for Swendsen-Wang. To bound the conductance of the set M , we prove that a configuration in M is unlikely to transition to U in one step.
Proof. We are going to argue that, with probability 1 − exp(−Ω(n)), the number of mononochromatic edges after one transition of Swendsen-Wang is too large to be in the set U . Note that the third step of Swendsen-Wang cannot decrease the number of monochromatic edges, so it suffices to analyze the first two steps.
Since σ ∈ M , by definition, the number of monochromatic edges under σ is (E m ± ε)n. The expected number of edges left after the second step of Swendsen-Wang is thus (1 − 1/B)(E m ± ε)n.
The following claim implies that for sufficiently small ε, this is greater than (E u ± ε)n, the number of monochromatic edges in a configuration from U . The proof is given in Section 10.1.
By standard Chernoff bounds, we can thus conclude that for sufficiently large n the transition from σ to U happens with exponentially small probability.
Remark 8. We believe that the lower bound on q in Theorem 5 arises from ignoring the effect of the third step of Swendsen-Wang.
We can now bound the conductance Φ SW of Swendsen-Wang.

Phase Coexistence for Random ∆-regular Graphs
In this section, we prove Lemma 23. The lemma will mostly follow from Lemma 11. We will need though a more refined analysis of the partition function conditioned on configurations close to u and m.
In analogy to (60), (61), define and for a subset Σ ⊆ Ω of the configuration space, denote by Z G (Σ) the partition function restricted to configurations in Σ, that is, We first prove the following weaker version of Lemma 23.
is given by (13). For all sufficiently small ε > 0, there exists C > 0, such that with probability 1 − o(1) over the choice of the graph G ∼ G(n, ∆), it holds that Z G T ≤ exp(−Cn) exp(Ψn) and Proof of Lemma 26. Define the region T by Recall from Theorem 4 that, for B = B o , the global maximum of Ψ 1 (α) occurs exactly when α is equal to one of u, m 1 , . . . , m q . It follows that for some constant C ′ = C ′ (ε) > 0. Note that for fixed n the possible values of α ∈ T are polynomially many. For all sufficiently large n, we thus have Let C be such that C ′ > C > 0. By Markov's inequality, we obtain Z Having established Lemma 26, we are ready to start arguing about the empirical distribution of edges, more precisely, the fraction of edges in G whose endpoints are assigned colors i, j for configurations σ ∈ U , M . The rough idea is as follows. Fix an arbitrary α ∈ △ q . In Section 4, we established that Ψ 1 (α) = max x Υ 1 (α, x) = (∆ − 1)f 1 (α) + ∆ max x g 1 (x), where the latter maximization is over x which satisfy the constraints (9). Since g 1 (x) is strictly concave in the convex region it is defined, this maximum is attained for a unique vector x, which from here on we shall denote by x α . Essentially by the same line of arguments as in the proof of Lemma 26, all the contribution to the first moment E G [Z α G ] comes from those x which are close to x α . Thus, by Markov's inequality and the lower bounds of Lemma 26, if x is sufficiently far away from x α , with probability 1−o(1) over the choice of the graph G, the empirical edge distribution of a configuration σ ∼ µ G will equal x with exponentially small probability. We are thus left to argue that for those x close to x α , the actual contribution to Z α G from configurations with edge empirical distribution x is close to its expectation. But for a graph satisfying Lemma 26 this is immediately guaranteed, since we know a lower bound on Z α G which matches its expectation up to a polynomial factor. It is useful at this point to give the expressions for the optimal vector x α , when α is a dominant phase. Note that a dominant phase α corresponds (via (20)) to a fixpoint (R 1 , . . . , R q ) of the tree recursions (6). The entries of the optimal vector x * = x α are given by Indeed, using that (R 1 , . . . , R q ) specify a fixpoint of the tree recursions (6), it can be checked that , and also that x * is a critical point of g(x). By the strict concavity of the function g 1 , it follows that x * is the unique optimal vector, as claimed. Note that the expressions for E m and E u in (59) can easily be derived from (64) when adapted to the ferromagnetic Potts model.
We next introduce some relevant notation. We first want to capture the contribution of configurations with specific edge empirical distribution. To do this, we need the notation x G (σ) of Section 4 introduced just before the expression for the first moment (8).
Thus, the definition of Z G (Σ, x 0 ) restricts the partition function not only to configurations belonging to Σ, but also to those having edge empirical distribution equal to x 0 . Now, we further extend this definition to capture that the edge empirical distribution is far from a prescribed vector x 0 . For We are ready to prove the following.
Proof. We prove that the upper bound Z G ( U , x u , ε) ≤ exp(−Cn) exp(Ψn) holds with probability 1 − o(1) over the choice of the graph. The remaining random variables may be treated similarly and thus the claim follows by a union bound.
Observe that

It follows that
Note that the sum in (65) is over polynomially many vectors α satisfying α − u ∞ ≤ ε. Further, for fixed α, E G [Z G (Σ α , x u , ε)] is also a sum over polynomially many x satisfying x − x u ∞ ≥ ε. For a fixed x, the exponential order of the term in the latter sum corresponding to x is given by the function Υ 1 (α, x). By approximating the sums with their maximum terms (and using the continuity of the function Υ 1 ), it is standard to conclude from here that Note that the maximum in (66) is justified by standard compactness arguments (which we give below since we will need it in the proof).
Since for all sufficiently small ε > 0 none of these maximizers lies in the region T (ε), we have that Ψ ′ < Ψ and hence there exists C ′ (ε) > 0 such that By choosing C so that 0 < C < C ′ , the desired bound now follows from (65) by an application of Markov's inequality.
We are now ready to give the proof of Lemma 23.
Proof of Lemma 23. By a union bound, a graph G ∼ G(n, ∆) satisfies with probability 1 − o(1) both Lemmas 26 and 27. We have Now observe that the sets U , M defined in (60) and (61) satisfy The conclusion follows.
To complete the proofs for Section 10, we now give the proof of Claim 25.
Proof of Claim 25. Using (59), we have that .

It follows that
Using the substitutions d = ∆ − 1, x = y d and the second equation in (35) (for t = 1), the r.h.s. can be rewritten as Recall, from the proof of Lemma 19, that for B = B o , it holds that y = y o , where y o = (q−1) 2/(d+1) . Since y o > 1, to prove the claim we only need to show that p(y o ) > 0, where p(y) := (q − 2)y d − (q − 1)y d−1 − (q − 1). Massaging, we obtain the equivalent inequality This is again a consequence of the Descartes' rule of signs: the polynomial h d (z) has exactly one positive root, say ρ d , and hence h d (z) > 0 for z positive is equivalent to z > ρ d . It follows that to prove (67) for q ≥ 2(d+1)/ ln(d+1), we only need to argue for its validity when q = q o := 2(d + 1)/ ln(d + 1). In other words, we need to show that h d (z o ) > 0 where z o := (q o − 1) 1/(d+1) . By direct calculations, it can be checked that the inequality is true for d = 2, . . . , 9. We therefore assume that d ≥ 10 in what follows. For all w > 1 it holds that (w + 3/4) 2 > w 2 + w + 1, which for To handle the two factors in (68), we will use the following bounds on z o .

Small Subgraph Conditioning Method
In this section, we give the outline for the proof of Lemma 11. The proof is a minor modification of the arguments in [17, Appendices A & B] which were carried out for random ∆-regular bipartite graphs. Here, we just need to account for the non-bipartite case which turns out to be completely analogous. For completeness, we give the adaptation of the calculations therein to account for the slightly different setting.
The main tool we are going to use is the following Theorem, which is due to [42]. The notation [X] m refers to the m-th order falling factorial of the variable X.
Theorem 28. For i = 1, 2, . . ., let λ i > 0 and δ i > −1 be constants and assume that for each n there are random variables X in , i = 1, 2, . . . , and Y n , all defined on the same probability space G = G n such that X in is non-negative integer valued, Y n ≥ 0 and E Y n > 0 (for n sufficiently large). Furthermore, the following hold: (A1) X in d −→ Z i as n → ∞, jointly for all i, where Z i ∼ Po(λ i ) are independent Poisson random variables; (A2) for every finite sequence j 1 , . . . , j m of non-negative integers, Let r(n) be a function such that r(n) → 0 as n → ∞. It holds that Y n > r(n)E G Y n asymptotically almost surely.
To obtain Lemma 11, we verify the assumptions of Theorem 28 for the random variables Z α G . Recall, we restrict our attention to α which are Hessian dominant. For G ∼ G(n, ∆), let X i = X in be the number of cycles of length i in G, i = 1, 2, . . ..
The most technical part of this verification is assumption (A4) which requires computing the precise asymptotics of the moments. This in turn reduces to certain determinants which are not completely trivial. Nevertheless, the arguments have been carried out in full generality in [17]. The only minor modification required in the present case is to account for random ∆-regular graphs instead of the bipartite random ∆-regular graphs studied in [17].
We obtain the following lemmas.
Proof of Lemma 30. The proof is close to [17, Proof of Lemma A.6], which is in turn close to [40, Proof of Lemma 7.4]. We just modify the approach to account for the distribution induced by the pairing model. We make the minor notation change from X i to X ℓ , i.e., for ℓ ≥ 1, X ℓ denotes the number of cycles of length ℓ in G. We show that Assumption (A2) in Theorem 28 holds when m = 1 and j 1 = 1, the extension to m > 1 and arbitrary indices j 1 , . . . , j m follows by standard arguments, see for example [31,Section 2] for an exposition of the argument in a very similar setting.
Let S = {S 1 , . . . , S q } be a partition of V such that |S i | = α i n for all i ∈ [q]. Note that S induces a configuration σ(S) by setting, for every vertex v ∈ V , σ(v) = i iff v ∈ S i . Denote by Y S the weight of the configuration σ(S).
Fix a specific partition S. By symmetry, We decompose X ℓ as follows: • ξ will denote a rooted and oriented ℓ-cycle, whose vertices are colored with {1, . . . , q} (note that the coloring is not assumed to be proper). A vertex colored i in ξ signifies that it corresponds to a vertex in S i .
• Once we have specified ξ, we use ζ to denote the ℓ points that the cycle traverses in order, such that the prescription of the vertex colors of ξ is satisfied. (Recall from Section 4 that points are elements of [∆n]; ζ specifies the pre-image of the cycle ξ in the pairing model, i.e., the matched points that respect the colors ξ of the cycle with respect to the partition S).
• 1 ξ,ζ is the indicator function whether the cycle specified by ξ, ζ is present in the graph G generated by the pairing model.
Once the vertex colors of a cycle have been specified, note that each possible cycle corresponds to exactly 2ℓ different configurations ξ (the number of ways to root and orient the cycle). For each of those ξ, the respective sets of configurations ζ are the same. Hence, we may write Let In light of (71), we need to study the ratio E[Y S | 1 ξ,ζ = 1]/E[Y S ]. At this point, to simplify notation, we may assume that ξ, ζ are fixed.
We have shown in Section 4 that where the variables x = (x 11 , . . . , x qq ) capture the number of edges between the different color classes in S. In particular, for i = j, ∆x ij n is the number of edges between the sets S i and S j , whereas ∆x ii n/2 is the number of edges within the set S i (cf. Section 4 for more details).
To calculate E[Y S | 1 ξ,ζ = 1], we need some notation. For colors i, j ∈ {1, . . . , q}, let a ′ ij be the number of edges in ξ whose one endpoint has color i and the other j. It will be convenient to denote a ii := 2a ′ ii and a ij := a ′ ij whenever i = j. Finally, let c i denote the number of vertices in ξ colored with i. The following equalities are immediate: We are almost set to compute E[Y S | 1 ξ,ζ = 1]. We denote by x the same set of variables as in (72). This number includes the a ij edges prescribed by ξ, ζ. To make the following formulas easier to digest let n∆x ′ ij = n∆x ij − a ij . We have Using that for constants c 1 , c 2 > 0, it holds that (c 1 n − c 2 )!/(c 1 n)! = (1 + o(1))(c 1 n) −c 2 , we obtain ∆α i n−2c i ∆x ′ i1 n,...,∆x ′ iq n ∆α i n ∆x i1 n,...,∆x iq n The asymptotics of the ratio E[Y S | 1 ξ,ζ = 1]/E[Y S ] are determined from those x * which maximize Υ 1 (α, x). Thus, we obtain For given ξ, the number of possible ζ in the pairing model is asymptotic where in the last equality we used that a ii = 2a ′ ii , a ij = a ′ ij for i = j, a ′ ij = a ′ ji and x * ij = x * ji . Note that the r.h.s. evaluates to 0 whenever there exist i, j such that B ij = 0 but a ij = 0, since then we have x * ij = 0 (cf. (64). This is in complete accordance with the fact that the configuration induced by the partition S has zero weight. Thus, by (71), we may write where a ′ = {a ′ 11 , . . . , a ′ qq } and N a ′ is the number of possible ξ with a ′ ij edges of type {i, j}. Using (64), we have that x * ij / √ α i α j is equal to the (i, j)-entry of the matrix M. Thus, the sum can be reformulated as the (multiplicative) weight of walks in a weighted multigraph whose (weighted) adjacency matrix is given by M (for more details on the technique see [27]). It thus follows that the sum equals Tr(M ℓ ) = 1 + q−1 j=1 µ ℓ j . The fact that the µ i 's are positive follows from the fact that B is a positive definite matrix, while the fact that the µ i 's are less than 1/(∆ − 1) follows from the results of Section 6.
where the µ i 's are as in Lemma 30.
Proof of Lemma 31. We have where we used that i≥1 Lemma 32. For a ferromagnetic model, for all Hessian dominant α it holds that where the µ i 's are as in Lemma 30.
Proof of Lemma 11. Lemmas 29-32 verify the assumptions of Theorem 28. The lemma thus follows by applying Theorem 28, for r(n) = 1/n.

The asymptotics of the moments
We follow closely the proof in [17,Appendix B], where very similar asymptotics are computed in detail. We first overview the approach in [17, Appendix B] in our setting. The asymptotics of E G [Z α G ] and E G [Z α G ] 2 are derived by first rewriting the sums in (8) and (10) as integrals and approximating the latter with Gaussian integrals. The principle behind the technique is the negative-definiteness of the Hessian at the maximizers of the functions Υ 1 and Υ 2 , which control the exponential order of the terms in the sums (8) and (10), respectively. This allows to focus on terms within O(1/ √ n) distance around the term with the maximum contribution. A thorough exposition of the technical details can be found in [28,Section 9.4]. Carrying out the above scheme in our setting is impeded by the fact that the sums in (8) and (10) are over variables which are linearly dependent. We will get rid of this linear dependence in the simplest way: for each of the two sums, we pick a subset S of the variables (with minimum cardinality) such that every variable is a (non-trivial) linear combination of variables in S. Variables in S span a full-dimensional space, inducing what we call a "full-dimensional representation" of the functions Υ 1 , Υ 2 when these are viewed as functions of the variables in S. The inconvenience that this procedure causes is, that in the calculation of the Gaussian integrals, (the determinant of) the Hessian matrices of the full-dimensional representations of Υ 1 , Υ 2 come into play.
In [17, Appendix B.1.1], the above setting is abstracted as follows: given a linear subspace Az = 0, compute (the determinant of) the Hessian matrix of the full-dimensional representation of a function Υ(z). It is not hard to see that a full-dimensional representation of Az = 0, assuming that A has row rank r, is obtained by first picking a submatrix A f induced by r linearly independent rows of A, and then picking r columns of A f to obtain an r × r invertible submatrix A f s (the variables corresponding to the columns of A f s can be written as non-trivial linear combinations of the remaining variables; the latter yield the full-dimensional representation). We denote by H f the Hessian matrix of the full-dimensional representation of Υ induced by the matrices A f , A f s . We further denote by H the Hessian matrix of Υ(z) (where z is now assumed to be unconstrained); note that H is diagonal. The following is proved in [17].
For a polynomial p(s), [s t ]p(s) denotes the coefficient of s t in p(s). where Remark 9. When A has full row rank, i.e., r = m, one can take T to be the identity matrix. Then, the r.h.s. in (76) simplifies into We will apply Lemma 33 in Sections 11.2.1 and 11.2.2 to calculate the asymptotics of the moments. To do this, we will need more information on the maximizers of the functions Υ 1 and Υ 2 for a Hessian dominant phase α. In particular, let x * be the maximizer of Υ 1 (α, x) (cf. (64)) and (γ * , y * ) be the maximizer of Υ 2 (γ, y). Adapting the proof of [17,Lemma 3.2], we have that γ * ij = α i α j for all i, j ∈ [q], from where it easily follows that y * ikjl = x * ij x * kl . Following [17, Appendix B.1.2], we use the following notation in Sections 11.2.1 and 11.2.2. For a vector z ∈ R n we denote by z D the n × n diagonal matrix whose i-entry on the diagonal equals z i for i ∈ n. For vectors z j ∈ R m j , j = 1, . . . , t, we denote by [z 1 , . . . , z t ] ⊺ the R j m j vector which is the concatenation of the vectors z 1 , . . . , z t . I n will denote the identity matrix with dimensions n × n and 0 will denote the all-zeros matrix whose dimensions will be inferred from context. For matrices M 1 and M 2 , M 1 ⊗ M 2 will denote the Kronecker product of 11.2.1 Proof of (74) The first moment E G [Z α G ] is a sum over x (and α is fixed). Note that if B ij = 0, then we may restrict the sum in (8) to those x which satisfy x ij = 0 without changing the sum's value. Further, since x ij = x ji and B ij = B ji , we may write the sum in (8) in terms of those x ij with i ≤ j. Let Henceforth x will denote {x ij } (i,j)∈P 1 . The observations above imply that the sum in (8) can be written over the possible values of the vector x. Note that for ferromagnetic models we have that (i, i) ∈ P 1 for every i ∈ [q] (cf. the discussion after Theorem 7). We are left to account for the linear dependencies induced by the q constraints a i = j x ij . In matrix form, we can write those as where A 1 is a {0, 1}-matrix with dimension q × |P 1 |. For a ferromagnetic model, as we shall display shortly, we have that the rank of A 1 is q (this holds more generally for matrices B which are irreducible and aperiodic, see for example footnote 7). To get a full-dimensional representation of the space (78), we will eliminate q variables from the vector x. This corresponds to picking q columns of A 1 which induce a q × q invertible submatrix of A 1 . We will denote this submatrix by A 1,s . For ferromagnetic models, we can choose the columns corresponding to the variables x ii for i ∈ [q], in which case A 1,s is simply the identity matrix. 7 Adapting the proof of [17,Lemma B.3], yields the following asymptotics for the first moment The details of the proof can be found in Appendix B.
Lemma 34. For a ferromagnetic model, 8 it holds that where H f 1,x is the Hessian of the full-dimensional representation of g 1 (x) evaluated at x = x * .
To expand the determinant in (79), we apply Lemma 33 and in particular Remark 9. This yields where H 1,x is the |P 1 | × |P 1 | diagonal matrix corresponding to the Hessian matrix of g 1 (x) (when x is unconstrained) and in the second equality in (80) we used that Det(A 1,s ) = 1 (by our choice of A 1,s ).
Since H 1,x is diagonal, we obtain that To show (81), it can be checked that where α D is the q × q diagonal matrix whose i-th diagonal entry is α i and S x is the q × q symmetric matrix whose (i, j) entry (when i ≤ j) is x * ij whenever (i, j) ∈ P 1 and 0 otherwise. Observe that (α D ) −1/2 S x (α D ) −1/2 = M, where M is the matrix in Lemma 30. From this, we obtain Since the spectrum of M is {1, µ 1 , . . . , µ q }, it follows that the spectrum of the matrix I q + M is {2, 1 + µ 1 , . . . , 1 + µ q−1 }. This yields (81), thus completing the proof of (74).
and an edge labelled (i, j) corresponds to the column labelled (i, j) in A1 (note that H has a self-loop on vertex i iff Bii > 0). Then A1,s specifies a subgraph H ′ of H with exactly q edges. It can be shown that A1,s is invertible if H ′ is spanning (i.e, every vertex in H ′ has non-zero degree) and all of the connected components of H ′ are unicyclic and non-bipartite (i.e., every connected component of H ′ has a unique cycle of odd length, where self-loops count as cycles of length 1). 8 We briefly comment on how the choice of the full-dimensional representation (i.e, the choice of A1,s) has been used in the derivation of (79). Relative to footnote 7, if the invertible submatrix A1,s corresponds to a subgraph with exactly c components which contain a non-trivial odd cycle (i.e., an odd cycle of length ≥ 3), there is a correction factor 2 −c in the r.h.s. of (80). The factor comes from (mod 2) constraints imposed by considering the sum of constraints in (78) corresponding to vertices in each such unicyclic component (in the derivation below, this factor cancels with the factor |Det(A1,s)| coming from Det(−H f 1,x ) −1/2 ; it can be shown that |Det(A1,s)| = 2 c ). Note that for our choice of A1,s, c equals zero, since the subgraph induced by the columns of A1,s consists of q components, each of which is a single vertex with a self-loop.

Proof of (75)
For the second moment, E G (Z α G ) 2 is a sum over γ, y while α is fixed. Analogously to (77), let Henceforth, y will denote {y ikjl } (i,k,j,l)∈P 2 . The constraints in (11) can now be written as and A 2,γ , A 2,y are {0, 1}-matrices with dimensions 2q × q 2 and q 2 × |P 2 |, respectively. It is easy to see that A 2,y has full row rank r y = q 2 , while A 2,γ has rank r γ = 2q − 1, so that the rank of A 2 is r 2 = r y + r γ = q 2 + 2q − 1. Thus, to specify a full-dimensional representation of (84), we need to specify an r 2 × r 2 invertible submatrix A 2,s of A 2 . It can be checked that any such submatrix A 2,s of A 2 must have the form where A s 2,γ , A s 2,y are r γ × r γ and r y × r y invertible submatrices of A s 2,γ , A s 2,y respectively. Thus, we only need to specify the matrices A s 2,γ , A s 2,y . We will choose A s 2,γ to be an arbitrary invertible submatrix of A 2,γ ; since A 2,γ is totally unimodular (it corresponds to the incidence matrix of the complete bipartite graph with q vertices on each side), we have Det(A s 2,γ ) 2 = 1. For ferromagnetic models, we can choose A s 2,y to be the identity matrix using the columns corresponding to variables y ikik with i, k ∈ [q], so Det(A s 2,y ) 2 = 1. It follows that Det(A 2,s ) 2 = 1. For future use (with the scope of applying Lemma 33), let A 2,f be the submatrix of A 2 induced by the rows corresponding to rows of A 2,s .
We have the following analogue of Lemma 34. The proof is given in Appendix B.
Denote by H 2 the diagonal matrix corresponding to the Hessian matrix of Υ 2 (γ, y)/∆ (when γ, y are unconstrained). Note that we may decompose H 2 as [ H 2,γ 0 0 H 2,y ], where H 2,γ is the q 2 × q 2 diagonal matrix corresponding to the Hessian matrix of (∆ − 1)f 2 (γ)/∆ and H 2,y is the |P 2 | × |P 2 | diagonal matrix corresponding to the Hessian matrix of g 2 (y) (see (14) for the specification of the functions f 2 and g 2 ).
We next apply Lemma 33 with the matrix (recall that α D is the q × q diagonal matrix whose i-th diagonal entry is α i and 0 q 2 is the q 2 × q 2 all-zeros matrix): to obtain the following equality: where in the latter equality we used that Det(A 2,s ) 2 = 1 (which was proved earlier) and L(A 2 , A 2,f , T 2 ) = 1/2 (follows by [17, Proof of Lemma B.8]). We calculate so that (75) will follow from We first write out the block structure of εT 2 − A 2 (H 2 ) −1 A ⊺ 2 . First, we have the following analogue of (82): where S γ is the q × q matrix whose (i, j) entry is γ * ij and S y is the q 2 × q 2 matrix whose ((i, k), (j, l)) entry is y * ikjl . From we obtain that: where , and the matrices W, V are given by (recall that M is the matrix in Lemma 30 whose eigenvalues are 1, µ 1 , . . . , µ q−1 ) In light of (89), it suffices to compute Det(H ′ 2 ). To do this, we proceed by taking the Schur complement of the matrix W. It is easy to see that W is invertible, since its spectrum is given by where t := 1/(∆ − 1). We also have Considering the Schur complement of the matrix W, we obtain To compute Det ε ∆−1 ∆ I 2q + Z , we need to obtain a simpler form for Z. The following lemma, which is proved at the end of this section, will allow for such a simplication.
It is standard to express the eigenvalues of 1 ∆−1 I 2q − M ′ in terms of the eigenvalues of M and hence obtain that the former matrix is invertible (since the eigenvalues of M other than 1 are less in absolute value than 1/(∆ − 1)). Thus, Lemma 36 gives By (91), Z is trivially symmetric. Using (92), we obtain the eigenvalues of Z.
Lemma 37. The spectrum of Z is given by Note that u 1 , u 2 are linearly independent eigenvectors of M ′ corresponding to the eigenvalue 1.
Using (88), we have that VV ⊺ = Iq S ′ γ S ′ γ Iq , where S ′ γ is the q × q matrix whose (i, j) entry is √ α i α j . It follows that u 1 and u 2 are eigenvectors of VV ⊺ with eigenvalues 2 and 0, respectively, and hence u 1 and u 2 are eigenvectors of Z with eigenvalues 2f (1) and 0, respectively.
Let u be an eigenvector of M ′ corresponding to an eigenvalue µ = 1. Note that u is perpendicular to both u 1 and u 2 . It follows that VV ⊺ u = u, so that Zu = f (µ)u. Thus, u is also an eigenvector of Z with eigenvalue f (µ).
To simplify the expressions, set r = (∆ − 1)/∆. The matrix εrI 2q shifts the eigenvalues of Z by εr. Thus, Lemma 37 yields Det εrI 2q + Z = εr εr + 2f (1) We have f (1), f (µ i ) = 0 for every i ∈ [q − 1], so that Plugging (90) and (93) in (91), we obtain Using this and (89), we obtain (87) as wanted. We conclude by giving the deferred proof of Lemma 36. The matrices D, E clearly have the same dimensions, since V has dimensions 2q × q 2 , N has dimensions q 2 × q 2 and M ′ has dimensions 2q × 2q. It remains to check that the entries of D, E are equal. First, we give explicit expressions for the entries of V, N. We have We next consider the i, (j, l) entries of the matrices D, E. Assume first that i ≤ q. We have An analogous calculation for q < i ≤ 2q yields that D i,(j,l) = E i,(j,l) for every i, j, l.

Bethe Prediction for General Models on Random Regular Graphs
In this section, we show how to extend Theorem 7 for general models on random regular graphs as discussed in Section 3.1. A more general result has been derived in [13,Theorem 1.16] for sequences of graphs converging locally to (random) trees, under the assumption of uniqueness of the Gibbs measure on the underlying tree. For the special case of random ∆-regular graphs, we show how to extend Theorem 7 when there is a unique semi-translation invariant Gibbs measure. Our proof has a different perspective and yields a slightly simpler condition for random ∆-regular graphs. Semi-translation invariant Gibbs measures on T ∆ are Gibbs measures that are invariant under any parity-preserving automorphisms of T ∆ (c.f., [6]). They can be specified by a pair of probability vectors (α, β) for the even and odd, respectively, vertices. Note that if there is a unique semitranslation invariant measure, then this measure is also translation invariant. Hence, it corresponds to a fixpoint of the tree recursions (6).
Theorem 38. Let B be a regular matrix which specifies a model such that for all ∆-regular graphs Z G > 0. If there is a unique semi-translation invariant Gibbs measure on T ∆ and the corresponding fixpoint is Jacobian attractive, then: where G is the uniform distribution on ∆-regular graphs with n vertices.
The first assumption in the theorem is mainly to avoid pathological cases where log Z G ≡ −∞ in which case the quantities are not well-defined. It is satisfied by many classes of models, e.g., permissive models ( [13]) such as the hard-core and antiferromagnetic Potts model, or even nonpermissive such as q-colorings when q ≥ ∆ + 1.
The proof of Theorem 38 is analogous to that of Theorem 7, once we establish the analog of Theorem 6 for general models. As we illustrated in Remark 2, this is hopeless to achieve in general and we must thus use the uniqueness assumption that Theorem 7 requires. Note that if there is a unique semi-translation invariant measure (which is the assumption in Theorem 7) then this measure is also translation invariant.
Proof of Theorem 38. Let α * be a dominant phase. By semi-translational uniqueness we have that α * is unique. We next describe how to obtain the analog of Theorem 6 under the assumptions of Theorem 38. Let p = ∆/(∆ − 1). We show that whenever there is a unique semi-translation Gibbs measure on T ∆ , it holds that exp(2Ψ 1 (α * )/∆) = B p→∆ .
From (16) and (21), we obtain: Note that the last inequality is trivial; we just enlarged the maximization region we consider. It is proved in [17] that the maximum of the r.h.s. is achieved at a semi-translation invariant fixpoint. If there is a unique semi-translation invariant Gibbs measure on T ∆ , this must be translation invariant and hence the maximum in the r.h.s. of (94) must occur at R = C. We thus obtain that (94)  , we obtain that Ψ 2 (α * ) = 2Ψ 1 (α * ), as wanted.
Since the dominant phase α * corresponds to a Jacobian attractive fixpoint (by assumption), it is also Hessian dominant (see Remark 4). With minor modifications (see footnotes 7 and 8), the results of Section 11.1 can be adapted to obtain a lower bound on Z α G as in Lemma 11. Thus the proof of Theorem 7 in Section 7.1 extends to the present setting as well.

A Non-Reconstruction for the Ordered Phases on the Tree
In this appendix, we give in detail the proof of the doubly exponential upper bound in (56). This appendix is organized as follows. In Appendix A.1, we review broadcasting processes on trees, the non-reconstruction property and a concentration result from [46]. In Appendix A.2, we review relevant connections between broadcasting processes and Gibbs measures defined by fixpoints of the tree recursions (6), which will allow us to apply the result of [46]. Finally, in Appendix A.3, we apply these results to the ferromagnetic Potts model and obtain the bound in (56). Let us fix some notation that will be used throughout this section. We will denote by T = (V, E) the infinite (∆ − 1)-ary tree. The root of T will be denoted by ρ. Also, for an integer ℓ ≥ 0, T ℓ will denote the subtree of T consisting of the first ℓ levels of T and W ℓ will denote the set of the leaves of T ℓ . Further, for a configuration σ : V → [q], we denote by σ W ℓ the restriction of σ on W ℓ .

A.1 Non-Reconstruction in Broadcasting Processes on Trees
Let q ≥ 2 be an integer and M = (M ij ) i,j∈[q] be a q × q stochastic matrix (i.e., the entries are non-negative and the entries in each row have sum equal to 1). We will further assume that M is irreducible and aperiodic, so that there exists a unique q-dimensional probability vector π * = (π * i ) i∈[q] so that π * M = π * . Note that the entries of π * are all positive. We will refer to π * as the stationary distribution of M. We will also assume that M is reversible with respect to π * , i.e., π * i M ij = π * j M ji for all i, j ∈ [q] (every such matrix is similar to a symmetric matrix and thus has real eigenvalues).
Let π = (π i ) i∈[q] be a q-dimensional probability vector with positive entries (note that it may hold that π = π * ). The broadcasting process M on the tree T = (V, E) with root ρ is a probability distribution ν on the set of assignments σ : V → [q] such that To generate σ with distribution ν, first pick randomly the spin of the root from the distribution π and then broadcast the spin down the tree, where each edge of the tree acts as a noisy channel. In particular, for an edge (u, v) of the tree where u is the parent of v, conditioned on the spin σ(u), the spin σ(v) is picked randomly from the distribution (M σ(u),1 , . . . , M σ(u),q ). We next define the non-reconstruction property, which roughly captures whether, as we go deeper into the tree, the information about the spin of the root vanishes. (For distributions µ 1 , µ 2 defined on the same space Ω, we denote by d T V (µ 1 , µ 2 ) the total variation distance between µ 1 , µ 2 .) Definition 2 (Non-Reconstruction). A broadcasting process M has the non-reconstruction property on the tree T if Non-reconstruction is often closely connected to the second largest eigenvalue of M. We will use the following concentration result of [46], which can be interpreted as quantifying the rate of convergence to 0, when the second largest eigenvalue of M is small with respect to the branching factor of the tree.
Theorem 39 ([46, Theorem 2.3], see also [34,35]). Consider a broadcasting process M on the infinite (∆ − 1)-ary tree with no hard constraints (i.e., all entries of M are positive), whose spin at the root is chosen according to some distribution π with positive entries. Let λ be the second largest eigenvalue of M in absolute value. Then, if M has non-reconstruction and (∆ − 1)λ 2 < 1, there exist constants C > 0 and ℓ 0 ≥ 1 such that the following holds. Let We remark here that the restriction in Theorem 39 that M has no hard constraints is not needed and, in fact, in [46], the analogous statement is proved for general models M whose state space satisfies a general connectivity condition. Since we will only apply the result of [46] to the ferromagnetic Potts model (which has no hard constraints), such connectivity issues are not present in our setting and thus out of our scope. In particular, in the language/notation of [46], all colors c, c ′ will be trivially compatible in our setting and thus the measure µ c (·) in [46,Theorem 2.3], which conditions the broadcasting process in the space of configurations where the "parent of the root" has color c, is identical to the unconditioned broadcasting process (denoted by ν in our setting). Further, [46,Theorem 2.3] is stated for the case where π = π * , i.e., when the distribution of the spin of the root ρ is chosen according to the stationary distribution of M. We next display how to derive from this the slightly more general version stated in Theorem 39.
In particular, suppose that Theorem 39 is true for some distribution π. Our goal is to show that it also holds for some other distribution π ′ (we assume that both π and π ′ have positive entries) for some constants C ′ , ℓ ′ 0 > 0. We will denote by ν the broadcasting process when the initial distribution is π and by ν ′ when the initial distribution is π ′ . We will also use B ℓ and B ′ ℓ to denote the set of "bad" configurations on W ℓ for the two processes ν, ν ′ , respectively (see Theorem 39).
Denote also by z ′ i (η) the respective quantity for the measure ν ′ . Since both processes have the same broadcasting matrix, observe that for any colors i, j ∈ [q] it holds that .

A.2 Broadcasting Processes and fixpoints of the tree recursions
In light of Theorem 39, our strategy for proving the bound in (56) will be to show that the measure ν i (corresponding to the i-th ordered phase in the Potts model) corresponds to a broadcasting process on the (∆ − 1)-ary tree (and then simply verify the assumptions of the theorem). The purpose of this section is to make this correspondence explicit. In fact, we will workout the relevant connections for general spin models. Let B be the interaction matrix of a q-spin system. As in Section 1.2, we assume that B is symmetric, irreducible and aperiodic. For an integer ∆ ≥ 3, recall that a fixpoint of the tree recursions is a vector R = (R 1 , . . . , R q ) with positive entries such that For the purpose of this section, we assume that the normalization in (6) is such that i R i = 1, i.e., R is a q-dimensional probability vector. We next define the broadcasting process corresponding to the fixpoint R by first specifying an appropriate broadcasting matrix. In particular, let M be the q × q matrix whose (i, j)-entry is given by We remark here that the normalization of the R i 's in (6) is not important for defining the matrix M (the entries remain unchanged if we scale the R i 's); we normalize R to be a probability vector so that we can use it as the initial distribution π of the spin of the root in the broadcasting process. In particular, in the notation of Appendix A.1, we will set π = R. This completes the specification of the broadcasting process (cf. (95)). Note that M is stochastic, irreducible and aperiodic. Further, its stationary distribution π * is given by the probability vector whose entries satisfy π * i ∝ R i j B ij R j for all i ∈ [q]. Finally, we have that M is reversible with respect to π * . In the rest of this section, we state several results that eventually will allow us to apply Theorem 39. First, we connect the spectral properties of M with the attractiveness of the fixpoint R of the tree recursions (see Section 3.2 for the relevant definitions).
Lemma 40. Let R be a Jacobian attractive fixpoint of the tree recursions and let M be the broadcasting matrix corresponding to R. Let λ be the second largest eigenvalue of M in absolute value. Then (∆ − 1)λ < 1.
Proof. Recall from Section 6 (see also the beginning of Section 8) that R is a Jacobian attractive fixpoint of the tree recursions if every eigenvalue x = 1 of the matrix satisfies (∆ − 1)|x| < 1. The result will thus follow by showing that the eigenvalues of M are identical to those of M. We will show that M and M are similar matrices, thus showing the result. Let A be the diagonal matrix whose i-th diagonal entry is given by √ α i . Note that A is invertible (since the R i 's are positive). By a direct calculation, it also holds that AMA −1 = M, thus proving that M and M are similar. This concludes the proof.
We now focus on connecting the Gibbs distribution of the spin model with interaction matrix B and the broadcasting process M. As before, let T be the infinite (∆ − 1)-ary tree with root ρ and denote by T ℓ the subtree of T consisting of the first ℓ levels of T and by W ℓ the set of leaves of T ℓ . We will denote by µ ℓ the Gibbs distribution on T ℓ corresponding to the spin system with interaction matrix B. We will use σ to denote configurations on T ℓ and by σ W ℓ the restriction of σ to the leaves W ℓ .
To connect µ ℓ to the broadcasting process M on T , we will need just a few more definitions. Let Q W ℓ (·) be the following product distribution on configurations on the leaves W ℓ . For a configuration η : W ℓ → [q], Finally, consider the following distribution ν ℓ , which is also defined on configurations on the leaves W ℓ , given by ν ℓ (η) ∝ µ ℓ (σ W ℓ = η) Q W ℓ (η) for all η : It is instructive at this point to spell out the interplay of these definitions with the bound in (56). Namely, the product distribution Q W ℓ (η) is the generalization of the product distribution Q i W (η) (defined just after (41)) and ν ℓ (·) is the generalization of the distribution ν i (·) (defined in (54)).
We are now ready to state the desired connection.
(Set i = τ v , j 1 = η v 1 , . . . , j d = η v d and multiply over v ∈ W ℓ .) To see (102), note that by (97), we have that, for every i, j ∈ [q], so, to prove (102), it suffices to show that the quantities . This is a consequence of the fact that R is a fixpoint of the tree recursions, i.e., R 1 , . . . , R q satisfy (6). This completes the induction step for the first equality in the lemma.
We next show the induction step for the second equality in the lemma. Fix η : W ℓ+1 → [q]. Denote by ρ 1 , . . . , ρ d the children of the root ρ. For k ∈ [d], denote by W ℓ+1,k the set of vertices in W ℓ+1 which are in the subtree of T rooted at ρ k and by η k the restriction of η on W ℓ+1,k . By the induction hypothesis, we have, for every k ∈ [d] and i ∈ [q], Note that ν(σ W ℓ+1,k = η k | σ ρ k = i) = ν(σ W ℓ = η k | σ ρ = i) from where we obtain that (Note that the normalizing factor depends on η k .) Using that T ℓ+1 is a tree, we then calculate that and (see [46,Lemma 3.1] for a thorough derivation) By a completely analogous argument to the one we used for (100) and (101) (i.e., using (97) and the fact that R is a fixpoint of the tree recursions (6)), we obtain that the r.h.s. in (103) and (104) are proportional by a factor that does not depend on i, thus completing the induction step for the second equality in the lemma. This concludes the proof of Lemma 41.

A.3 Application to the Ferromagnetic Potts model -Proof of (56)
We are now able to apply the results of Appendices A.1 and A.2 to the ferromagnetic Potts model and prove the bound (56) for the ordered phases on the tree. Recall that the interaction matrix B of the q-state ferromagnetic Potts model has diagonal entries equal to B > 1 and off-diagonal entries equal to 1. An ordered phase corresponds to a fixpoint R = (R 1 , . . . , R q ) of the tree recursions. Thus, the R i 's satisfy Recall that there are q ordered phases which are symmetric, each corresponding to a color i ∈ [q]. W.l.o.g., we will focus on the ordered phase corresponding to the color i = 1. As we showed in Section 8, the solution of (105) corresponding to the ordered phase i = 1 is given by the vector R which satisfies (105), R 1 > R 2 = . . . = R q and R 1 /R q is maximum (see Remark 5). Such a solution exists in the non-uniqueness region, i.e., when B > B u . The broadcasting matrix M corresponding to the ordered phase i = 1 is given by (97): We need the following lemma, which can be inferred from [35, Proof of Theorem 1.4]. For completeness, we give the proof.
Lemma 42. Let ∆ ≥ 3 be an integer and B > B u . Then, the broadcasting process M defined by (106) is non-reconstructible on the (∆ − 1)-ary tree.
Proof. Let i, j ∈ [q] be two arbitrary colors with i = j and consider two copies X, Y of the broadcasting process on the (∆ − 1)-ary tree where the spins of the root ρ are conditioned to be i and j respectively. To show that the total variation distance between the distributions ν(σ W ℓ = · | σ ρ = i) and ν(σ W ℓ = · | σ ρ = j) goes to 0 as ℓ → ∞, it suffices to couple X, Y so that the expected number of disagreements, i.e., vertices in W ℓ whose spins are different, goes to 0 as ℓ → ∞. In turn, it suffices to couple one step of the broadcasting process so that the expected number of disagreements is bounded by some constant κ < 1/(∆ − 1), since this yields that the expected number of disagreements at level ℓ decays exponentially with ℓ, at least as fast as ((∆ − 1)κ) ℓ . In particular, let (u, v) be an arbitrary edge in the tree, with u being the parent of v. By the Coupling Lemma, conditioned on the spin of u in X and Y , we can couple the spins of v in X and Y so that the probability that they are different is bounded by κ, where κ := max i,j∈ [q] d T V ν(σ v = · | σ u = i), ν(σ v = · | σ u = j) = max i,j 1 2
In the following, we justify that κ < 1/(∆ − 1). We will see that, in the case of the ferromagnetic Potts model, κ is related to the eigenvalues of the Jacobian matrix of the tree recursions (evaluated at the fixpoint), which we have already studied in Section 8. First, we find a simpler expression for κ. Consider colors i, j = 1. Then where λ 1 is as in (39). Consider now the case that i = 1 and j = 1. Using that B > 1 and R 1 > R q , we have M 11 > M j1 and M 1k < M jk for k = 1. It follows that where λ 2 is as in (39). In the proof of Lemma 18, we showed that λ 1 , λ 2 < 1/(∆−1) for all B > B u , which shows that κ < 1/(∆ − 1), thus completing the proof of the lemma.
We conclude this appendix by giving the proof of (56).
Since γ * , y * is a critical point of Υ 2 (γ, y), for all sufficiently small δ > 0, we have the expansion where H = H f 2 is the Hessian matrix of Υ 2 evaluated at (γ * , y * ) scaled by 1/∆. Now, we may proceed analogously to the proof of Lemma 34 and obtain L = 2 q 2 −1 (i,k,j,l)∈P 2