Uniqueness, Spatial Mixing, and Approximation for Ferromagnetic 2-Spin Systems

For anti-ferromagnetic 2-spin systems, a beautiful connection has been established, namely that the following three notions align perfectly: the uniqueness in infinite regular trees, the decay of correlations (also known as spatial mixing), and the approximability of the partition function. The uniqueness condition implies spatial mixing, and an FPTAS for the partition function exists based on spatial mixing. On the other hand, non-uniqueness implies some long range correlation, based on which NP-hardness reductions are built. These connections for ferromagnetic 2-spin systems are much less clear, despite their similarities to anti-ferromagnetic systems. The celebrated Jerrum-Sinclair Markov chain [JS93] works even if spatial mixing or uniqueness fails. We provide some partial answers. We use $(\beta,\gamma)$ to denote the $(+,+)$ and $(-,-)$ edge interactions and $\lambda$ the external field, where $\beta\gamma>1$. If all fields satisfy $\lambda<\lambda_c$ (assuming $\beta\le\gamma$), where $\lambda_c=\left(\gamma/\beta\right)^\frac{\Delta_c+1}{2}$ and $\Delta_c=\frac{\sqrt{\beta\gamma}+1}{\sqrt{\beta\gamma}-1}$, then a weaker version of spatial mixing holds in all trees. Moreover, if $\beta\le 1$, then $\lambda<\lambda_c$ is sufficient to guarantee strong spatial mixing and FPTAS. This improves the previous best algorithm, which is an FPRAS based on Markov chains and works for $\lambda<\gamma/\beta$ [LLZ14a]. The bound $\lambda_c$ is almost optimal. When $\beta\le 1$, uniqueness holds in all infinite regular trees, if and only if $\lambda\le\lambda_c^{int}$, where $\lambda_c^{int}=\left(\gamma/\beta\right)^\frac{\lceil\Delta_c\rceil+1}{2}$. If we allow fields $\lambda>\lambda_c^{int'}$, where $\lambda_c^{int'}=\left(\gamma/\beta\right)^\frac{\lfloor\Delta_c\rfloor+2}{2}$, then approximating the partition function is #BIS-hard.

, then a weaker version of spatial mixing holds in all trees. Moreover, if β ≤ 1, then λ < λ c is sufficient to guarantee strong spatial mixing and FPTAS. This improves the previous best algorithm, which is an FPRAS based on Markov chains and works for λ < γ/β [LLZ14].
The bound λ c is almost optimal and can be viewed as a variant of the uniqueness condition with the degree d relaxed to be a real number instead of an integer. , then approximating the partition function is #BIS-hard.
Interestingly, unless ∆ c is an integer, neither λ c nor λ int c is the tight bound in each own respect. We provide examples where correlation decay continues to hold after λ c , and irregular trees in which spatial mixing fails for some λ < λ int c .

Introduction
Spin systems model nearest neighbor interactions. In this paper we study 2-state spin systems. An instance is a graph G = (V, E), and a configuration σ assigns one of the two spins "0" and "1" to each vertex, that is, σ is one of the 2 |V| possible assignments σ : V → {0, 1}. The local interaction along an edge is specified by a matrix A = A 0,0 A 0,1 A 1,0 A 1,1 , where A i,j is the (non-negative) local weight when the two endpoints are assigned i and j respectively. We study symmetric edge interactions, that is, A 0,1 = A 1,0 . Normalize A so that A = β 1 1 γ . Moreover, we also consider the external field, specified by a mapping π : V → R + . When a vertex is assigned "0", we give it a weight π(v). For a particular configuration σ, its weight w(σ) is a product over all edge interactions and vertex weights, that is where m 0 (σ) is the number of (0, 0) edges under the configuration σ and m 1 (σ) is the number of (1, 1) edges. An important special case is the Ising model, where β = γ. The Gibbs measure is a natural distribution in which each configuration σ is drawn with probability proportional to its weight, that is, Pr G;β,γ,π (σ) ∼ w(σ). The normalizing factor of the Gibbs measure is called the partition function, defined by Z β,γ,π (G) = σ:V→{0,1} w(σ). The partition function encodes rich information regarding the macroscopic behavior of the spin system. We will be interested in the computational complexity of approximating Z β,γ,π (G). We also simply write Z β,γ,λ (G) when the field is uniform, that is, π(v) = λ for all v ∈ V. A system with uniform fields is specified by the three parameters (β, γ, λ).
Spin systems are classified into two families with distinct physical and computational properties: ferromagnetic systems where the edge interaction is attractive (βγ > 1), and anti-ferromagnetic systems where it is repulsive (βγ < 1).
Recently, beautiful connections have been established regarding three different aspects of antiferromagnetic 2-spin systems. The uniqueness of Gibbs measures in infinite regular trees of degrees up to ∆ implies correlation decay in all graphs of maximum degree ∆, and therefore the existence of fully polynomial-time approximation scheme (FPTAS) for the partition function [Wei06,LLY12,SST12,LLY13], while if the uniqueness fails, then long range correlation appears and the partition function has no fully polynomial-time randomized approximation scheme (FPRAS) unless NP = RP [SS14,GŠV12]. It suggests that the mathematical property of the uniqueness, the physical property of spatial mixing, and the computational property of approximating the partition function, line up perfectly in anti-ferromagnetic 2-spin systems.
For ferromagnetic systems, the picture is much less clear. In a seminal paper [JS93], Jerrum and Sinclair gave an FPRAS for the ferromagnetic Ising model β = γ > 1 with any consistent external field λ for general graphs without degree bounds. Thus, despite the phase transition of the uniqueness and spatial mixing, there is no computational complexity transition of approximating its partition function in the ferromagnetic Ising model. This is in sharp contrast to anti-ferromagnetic Ising models β = γ < 1, where computational and phase transitions align perfectly. It is not clear at all whether spatial mixing or correlation decay plays any role in the computational complexity.
For general ferromagnetic 2-spin systems with external fields, the threshold of approximating the partition function is still open. On the complexity side, Goldberg and Jerrum showed that any ferromagnetic 2-spin system is no harder than counting independent sets in bipartite graphs (#BIS) [GJ07], for which is conjectured to have no FPRAS [DGGJ03] (the approximation complexity of #BIS is still open). Based on an earlier result [CGG + 14], Liu, Lu and Zhang showed that approximating the partition function is #BIS-hard if we allow external fields beyond (γ/β) ∆c +2 2 1 where On the algorithmic side, by reducing to the Ising model, an MCMC based FPRAS is known for the range of λ ≤ γ/β [GJP03], which is recently improved to λ ≤ γ/β [LLZ14]. On the other hand, if we apply the correlation decay algorithmic framework to various individual pairs of parameters (β, γ), it is not hard to get better bounds than γ/β. However, such success for individual problems do not seem to share meaningful inner connections. In particular, it is not clear how far one may push this method.
In this paper, we identify a threshold that almost tightly maps out the boundary of the correlation decay regime, that is, λ c = (γ/β) βγ−1 . We show that for any λ < λ c a variant of spatial mixing holds (Theorem 1) for arbitrary trees. If λ > (γ/β) ∆c +1 2 , then the spatial mixing does not hold. This spatial mixing is weaker than what an algorithm usually requires, but in the regime of β ≤ 1 it implies (and therefore is equivalent to) strong spatial mixing. As an algorithmic consequence, we have FPTAS for all β ≤ 1 < γ, βγ > 1, and λ < λ c (Theorem 2). Recall that if we , then the problem is #BIS-hard [LLZ14]. Hence only an integral gap remains for the β ≤ 1 < γ case.
The reason behind λ c is a nice interplay among uniqueness, spatial mixing, and approximability. We start with some purely mathematical observations on the symmetric tree recursion f d (x) = λ βx+1 x+γ d , an increasing function in x. Relax the range of d in f d (x) to be real numbers, and we can get that ∆ c is the critical (possibly fractional) degree and λ c is the corresponding critical external field. Let us call this condition "λ < λ c " the fractional uniqueness condition. This set of critical parameters enjoys some very nice mathematical properties. For d = ∆ c and λ = λ c , the function f d (x) has a unique fixed point x = γ/β and f d ( x) = 1. Moreover, it also satisfies that f d ( x) = 0, which is a necessary condition for the contraction of the tree recursion (easily derived using the heuristic of finding potential functions described in [LLY13]). All these nice mathematical properties prove to be useful in our later analysis. For degrees other than ∆ c , their critical external fields are much less convenient from a mathematical point of view. The function f d (x) has two fixed points: one is crossing and the other is tangent. Moreover, f d (x) = 0 does not necessarily hold.
Starting from this purely mathematic observation, we prove that the fractional uniqueness condition does have its own significance although we do not know any combinatorial interpretation of fractional degrees. We prove that any system with the fractional uniqueness exhibits a variant of spatial mixing in all trees which are not necessarily regular, as follows. We use p v to denote the marginal probability of v (being assigned "0"). Theorem 1. Let (β, γ, λ) be a set of parameters of the system such that βγ > 1, β ≤ γ, and λ < λ c . Let T v and T v be two trees with roots v and v respectively. If the two trees have the same structure in the first levels, then In other words, if we simply truncate a tree at depth , the marginal probability of its root will change by only at most O(exp(− )). Surprisingly, if we replace this fractional uniqueness condition by its integral counterpart, then this implication no longer holds and there is a counterexample (see Section 5). More precisely, it is no longer true that the uniqueness in infinite regular trees implies correlation decay in graphs or even trees, since our counterexample is an irregular tree. We note that this is in sharp contrast to anti-ferromagnetic systems, where (integral) uniqueness implies correlation decay.
The proof of Theorem 1 uses the potential method to analyze decay of correlation, which is now streamlined (see e.g. [LLY13]). The main difficulty is to find a good potential. In other words, we want to solve a variational problem minimizing the maximum of the decay rate function. The main novelty in our choice of the potential is that it does not work for all variables. Instead, it is welldefined only for variables in the range of (0, λ 1+λ ]. Here variables represent marginal probabilities of a vertex being assigned 1, namely Pr(v = 1), and it can be easily verified that in trees without boundary condition, Pr(v = 1) ∈ (0, λ 1+λ ] given the assumption of Theorem 1. Indeed, with this assumption, in any instance (not necessarily trees) without pinning, the marginal probability is always within this range (see Proposition 3.5). Also note that with our choice, the proof is relatively clean and significantly simpler than similar proofs in other settings. In particular, we do not need a certain "symmetrization" argument (see e.g. [LLY13,SSŠY15]). We also use a trick of truncating the potential to deal with unbounded degrees (see Eq. (5)).
From the computational complexity point of view, we would like to get FPTAS for the partition function, which requires a condition called strong spatial mixing (SSM). It is stronger than the spatial mixing in Theorem 1 by imposing arbitrary partial configurations. We are able to prove SSM under the fractional uniqueness condition for the range of β ≤ 1. Indeed, if β ≤ 1, then the two versions of spatial mixing are equivalent. Let I be an interval of the form [λ 1 , λ 2 ] or (λ 1 , λ 2 ]. We consider the following problem. Name #2SPIN(β, γ, I) Instance A graph G = (V, E) and a mapping π : V → R + , such that π(v) ∈ I for any v ∈ V.
Then we have the following theorem.
For the range of β > 1, SSM does not hold even if λ < λ c . However, we conjecture that Theorem 2 can be extended to the β > 1 range as well, mainly due to Theorem 1, which does not require β ≤ 1. Moreover, we show that even if β > 1, the marginal probability in any instance is within the range of (0, λ 1+λ ] given λ < λ c (see Proposition 3.5). This seems to imply that the main reason our algorithm fails is due to pinnings in the self-avoiding walk tree construction, whereas in a real instance these pinnings cannot aggregate enough "bad" influence. However, to turn such intuition into algorithms requires a careful treatment of these pinnings to achieve an FPTAS without SSM. We leave this as an important open question.
At last, we note that neither λ c nor its integral counterpart is the exact threshold in each own respect, even if β ≤ 1. Strong spatial mixing continues to hold even if λ > λ c in a small interval. We give a concrete example to illustrate this point in Section 4, Proposition 4.1. Moreover, as mentioned earlier, an irregular tree exists where the correlation decay threshold is lower than the threshold for all infinite regular trees. This is discussed in Section 5. It is another important open question to figure out the exact threshold between λ c and its integral counterpart.

Preliminaries
An instance of a 2-spin system is a graph G = (V, E). A configuration σ : V → {0, 1} assigns one of the two spins "0" and "1" to each vertex. We normalize the edge interaction to be β 1 1 γ , and also consider the external field, specified by a mapping π : V → R + . When a vertex is assigned "0", we give it a weight π(v). All parameters are non-negative. For a particular configuration σ, its weight w(σ) is a product over all edge interactions and vertex weights, that is where m 0 (σ) is the number of (0, 0) edges given by the configuration σ and m 1 (σ) is the number of (1, 1) edges. An important special case is the Ising model, where β = γ. Notice that in the statistic physics literature, parameters are usually chosen to be the logarithms of our parameters above. Different parameterizations do not affect the complexity of the same system. We also write λ v := π(v). If π is a constant function such that λ v = λ > 0 for all v ∈ V, we also denote it by λ. We say π has a lower bound (or an upper bound) λ > 0, if π satisfies the guarantee that The Gibbs measure is a natural distribution in which each configuration σ is drawn with probability proportional to its weight, that is, Pr G;β,γ,π (σ) ∼ w(σ). The normalizing factor of the Gibbs measure is called the partition function, defined by Z β,γ,π (G) = σ:V→{0,1} w(σ). Recall that we are interested in the computational problem #2SPIN(β, γ, I), where I is an interval of the form [λ 1 , λ 2 ] or (λ 1 , λ 2 ], for which Z β,γ,π (G) is the output. When input graphs are restricted to have a degree bound ∆, we write #∆-2SPIN(β, γ, I) to denote the problem. When the field is uniform, that is, λ is the only element in I, we simply write #2SPIN(β, γ, λ). Due to [CK12] and a standard diagonal transformation, for any constant λ > 0, #2SPIN(β, γ, λ) is #P-hard unless β = γ = 0 or βγ = 1.

The Self-Avoiding Walk Tree
We briefly describe Weitz's algorithm [Wei06]. Our algorithms presented later will follow roughly the same paradigm.
The Gibbs measure defines a marginal distribution of spins for each vertex. Let p v denote the probability of a vertex v being assigned "0". Since the system is self-reducible, #2SPIN(β, γ, λ) is equivalent to computing p v [JVV86] (for details, see for example Lemma 2.6).
Let σ Λ ∈ {0, 1} Λ be a configuration of Λ ⊂ V. We call vertices in Λ fixed and other vertices free. We use p σ Λ v to denote the marginal probability of v being assigned "0" conditional on the configuration σ Λ of Λ.
Suppose the instance is a tree T with root v. Let R σ Λ T := p σ Λ v /(1 − p σ Λ v ) be the ratio between the two probabilities that the root v is 0 and 1, while imposing some condition σ Λ (with the convention that R σ Λ T = ∞ when p σ Λ v = 1). Suppose that v has d children v i , . . . v d . Let T i be the subtree with root v i . Due to the independence of subtrees, it is straightforward to get the following recursion for calculating R σ Λ T : where the function F d (x 1 , . . . , x d ) is defined as We allow x i 's to take the value ∞ as in that case the function F d is clearly well defined. In general we use capital letters like F, G, C, . . . to denote multivariate functions, and small letters f, g, c, . . . to denote their symmetric versions, where all variables take the same value.
Here we define f d (x) := λ βx+1 x+γ d to be the symmetric version of F d (x).
In contrast to the case of trees, there is no easy recursion to calculate R σ Λ G,v for a general graph G. This is because of dependencies due to cycles. Weitz [Wei06] reduced computing the marginal distribution of v in a general graph G to that in a tree, called the self-avoiding walk (SAW) tree, denoted by T SAW (G, v). To be specific, given a graph G = (V, E) and a vertex v ∈ V, T SAW (G, v) is a tree with root v that enumerates all self-avoiding walks originating from v in G, with additional vertices closing cycles as leaves of the tree. Each vertex in the new vertex set V SAW of T SAW (G, v) corresponds to a vertex in G, but a vertex in G may be mapped to more than one vertices in V SAW . A boundary condition is imposed on leaves in V SAW that close cycles. The imposed colors of such leaves depend on whether the cycle is formed from a small vertex to a large vertex or conversely, where the ordering is arbitrarily chosen in G. Vertex sets S ⊂ Λ ⊂ V are mapped to respectively S SAW ⊂ Λ SAW ⊂ V SAW , and any configuration σ Λ ∈ {0, 1} Λ is mapped to σ Λ SAW ∈ {0, 1} Λ SAW . With slight abuse of notations we may write S = S SAW and σ Λ = σ Λ SAW when no ambiguity is caused.
, and any neighborhood of v in T can be constructed in time proportional to the size of the neighborhood.
The SAW tree construction does not solve a #P-hard problem, since T SAW (G, v) is potentially exponentially large in size of G. For a polynomial time approximation algorithm, we may run the tree recursion within some polynomial size, or equivalently a logarithmic depth. At the boundary where we stop, we plug in some arbitrary values. The question is then how large is the error due to our random guess. To guarantee the performance of the algorithm, we need the following notion of strong spatial mixing.

Definition 2.2. A spin system on a family G of graphs is said to exhibit strong spatial mixing (SSM) if for any graph
, where S ⊂ Λ is the subset on which σ Λ and τ Λ differ, and dist(v, S) is the shortest distance from v to any vertex in S.
The weak spatial mixing is defined similarly by measuring the decay with respect to dist(v, Λ) instead of dist(v, S). Spatial mixing properties are also called correlation decay in Statistical Physics.
If SSM holds, then the error caused by early termination in T SAW (G, v) and random boundary values is only exponentially small in the depth. Hence our algorithm is an FPTAS. In a lot of cases, the existence of an FPTAS boils down to showing SSM holds.

The Uniqueness Condition in Regular Trees
Let T d denote the infinite d-regular tree, also known as the Bethe lattice or the Cayley tree. If we pick an arbitrary vertex as the root of T d , then the root has d children and every other vertex has d − 1 children. Notice that the difference between T d and an infinite (d − 1)-ary tree is only the degree of the root. We consider the uniqueness of Gibbs measures on T d , where the field is uniformly λ > 0. Due to the symmetric structure of T d , the standard recursion (2) thus becomes For anti-ferromagnetic systems, that is, βγ < 1, there is a unique fixed point to f d (x) = x, denoted by x. It has been shown that the Gibbs measure in T d is unique if and only if In contrast, if βγ > 1, then f d (x) > 0 for any x > 0. There may be 1 or 3 positive fixed points such that x = f d (x). It is known [Kel85,Geo11] that the Gibbs measure of two-state spin systems in T d is unique if and only if there is only one fixed point for Then we have the following result. Note that the condition ∆−1 < ∆ c matches the exact threshold of fast mixing for Gibbs samplers in the Ising model [MS13]. In Section 3.1, we will show that, SSM holds and there exists an FPTAS for the partition function, in graphs with degree bound ∆ < ∆ c + 1. This is Theorem 3.
To study general graphs, one needs to consider infinite regular trees of all degrees. If β > 1 (still assuming βγ > 1 and β ≤ γ), then there is no λ such that the uniqueness condition holds in T d for all degrees d ≥ 2. In contrast, let λ int c := (γ/β) ∆c+1 2 and we have the following.
Proposition 2.4. Let (β, γ) be two parameters such that βγ > 1 and β ≤ 1 < γ. The uniqueness condition holds in T d for all degrees d ≥ 2 if and only if λ < λ int c .
Details and proofs about Propositions 2.3 and 2.4 are given in Section 6.1.

The Potential Method
We would like prove the strong spatial mixing in arbitrary trees, sometimes with bounded degree ∆, under certain conditions. This is sufficient for approximation algorithms due to the self-avoiding walk tree construction. Our main technique in the analysis is the potential method. The analysis in this section is a standard routine, with some specialization to ferromagnetic 2-spin models (cf. [LLY13,SSŠY15]). To avoid interrupting the flow, we move all details and proofs to Section 6.2. Roughly speaking, instead of studying (2) directly, we use a potential function Φ(x) to map the original recursion to a new domain (see the commutative diagram Figure 1). Morally we can choose whatever function as the potential function. However, we would like to pick "good" ones so as to help the analysis of the contraction. Define ϕ(x) := Φ (x) and Definition 2.5. Let Φ : R + → R + be a differentiable and monotonically increasing function. Let ϕ(x) and C ϕ,d (x) be defined as above. Then Φ(x) is a good potential function for degree d and field λ if it satisfies the following conditions: We say Φ(x) is a good potential function for d and field π, if Φ(x) is a good potential function for d and any λ in the codomain of π, In Definition 2.5, Condition 1 is rather easy to satisfy. The crux is in fact Condition 2. We call α in Condition 2 the amortized contraction ratio of Φ(x). It has the following algorithmic implication. The proof is based on verifying strong spatial mixing.
Lemma 2.6. Let (β, γ) be two parameters such that βγ > 1. Let G = (V, E) be a graph with a maximum degree ∆ and n many vertices and π be a field on G. Let λ = max v∈V {π(v)}. If there exists a good potential function for π and all d ∈ [1, ∆ − 1] with contraction ratio α < 1, then Z β,γ,π (G) can be approximated deterministically within a relative error ε in time O n nλ When the degree is unbounded, the SAW tree may grow super polynomially even if the depth is of order log n. We use a refined metric replacing the naive graph distance used in Definition 2.2. Strong spatial mixing under this metric is also called computationally efficient correlation decay [LLY12,LLY13].

Definition 2.7. Let T be a rooted tree and M > 1 be a constant. For any vertex
Let B( ) be the set of all vertices whose M-based depths of v is at most . It is easy to verify inductively such that |B( )| ≤ M in a tree. We then define a slightly stronger notion of potential functions.
Definition 2.8. Let Φ : R + → R + be a differentiable and monotonically increasing function. Let ϕ(x) and C ϕ,d (x) defined in the same way as in Definition 2.5. Then Φ(x) is a universal potential function for the field λ if it satisfies the following conditions: We say Φ(x) is a universal potential function for a field π, if Φ(x) is a universal potential function for any λ in the codomain of π. We also call α the contraction ratio and call M the base. The following two lemmas show that our main theorems follow from the existence of a universal potential function.
The way we define universal potential functions restricts them to only apply to the range of (0, λ]. This will be true in our applications (see for example Claim 3.3).
Lemma 2.9. Let (β, γ, λ) be three parameters such that βγ > 1, β ≤ γ, and λ < λ c . Let T and T be two trees that agree on the first levels with root v and v respectively. If there exists a universal potential function Lemma 2.10. Let (β, γ) be two parameters such that βγ > 1 and β ≤ 1 < γ. Let G = (V, E) be a graph with n many vertices and π be a field on G. Let λ = max v∈V {π(v)}. If there exists a universal potential function Φ(x) for π with contraction ratio α < 1 and base M, then Z β,γ,π (G) can be approximated deterministically within a relative error ε in time O n 3 nλ

Correlation Decay below ∆ c or λ c
In this section, we show our main results. We will first show a folklore result for bounded degree graphs with a very simple proof. Then we continue to show the main theorem regarding general graphs. We carefully choose two appropriate potential functions and then apply Lemma 2.6 or Lemma 2.10.

Bounded Degree Graphs
We first apply our framework to get FPTAS for graphs with degree bound ∆ < ∆ c + 1 = 2 √ βγ √ βγ−1 . Correlation decay for graphs with such degree bounds is folklore and can be found in [Lyo89] for the Ising model. Algorithmic implications are also shown, e.g. in [ZLB11]. As we shall see, the proof is very simple in our framework. Note that λ, ∆, and α are considered constants for the FPTAS.
Theorem 3. Let (β, γ) be two parameters such that βγ > 1. Let G = (V, E) be a graph with a maximum degree ∆ < ∆ c + 1 and n many vertices, and let π be a field on G. Let λ = max v∈V {π(v)}. Then Z β,γ,π (G) can be approximated deterministically within a relative error ε in time O n nλ Proof. We choose our potential function to be Φ 1 (x) = log x such that ϕ 1 (x) := Φ 1 (x) = 1 x . We verify the conditions of Definition 2.5. Condition 1 is trivial. For Condition 2, we have that for any where we used the fact that for any x > 0, Hence Φ 1 (x) is a good potential function for all degrees d ∈ [1, ∆ − 1] with contraction ratio α. The theorem follows by Lemma 2.6.
Note that Theorem 3 matches the uniqueness condition in Proposition 2.3 and, restricted to the Ising model, the fast mixing bound of Gibbs samplers in [MS13].

General Graphs
The following two technical lemmas show some important properties regarding the threshold λ c , which are keys to get our main theorems. Proofs are given in Section 6.3.
Lemma 3.2. Let β, γ be two parameters such that βγ > 1 and β ≤ γ. For any 0 < x ≤ λ c , we have In our applications, the quantity x in both lemmas will be the ratio of marginal probabilities in trees, denoted by R v for a vertex v. To make use of these properties, one key requirement is that 0 < x ≤ λ c . This is not necessarily true in trees with pinning (and therefore not true in general SAW trees). Nevertheless, it does hold in trees without pinning.  For (β, γ, λ) where βγ > 1, β ≤ γ, and λ < λ c , R v ∈ (0, λ] holds in trees without pinning. We prove Claim 3.3 by induction. For any tree T v , if v is the only vertex, then R v = λ and the base case holds. Given Lemma 3.1 and λ < λ c , the inductive step to show Claim 3.3 follows from the standard tree recursion (2).
In addition, it also holds when β ≤ 1, in trees even with pinning (but not counting the pinned vertices). This includes the SAW tree construction as special cases. To see that, for any vertex v, if one of v's child, say u, is pinned to 0 (or 1), then we can just remove u and change the field of v from λ v to λ v = λ v β (or λ v = λ v /γ), without affecting the marginal probability of v and any other vertices. By our assumptions λ v < λ c and β ≤ 1 < γ, we have that λ v < λ c as well. Hence, after removing all pinned vertices, we still have that λ v ≤ λ c for all v ∈ V. This reduces to Claim 3.3.
Indeed, both of Theorem 1 and 2 can be generalized to the setting where vertices may have different external fields as long as they are all below λ c , as follows.
Theorem 4. Let (β, γ) be two parameters such that βγ > 1, β ≤ γ, and λ < λ c . Let T v and T v be two trees with roots v and v respectively. Let λ = max u∈Tv∪T v {π(u)}. If λ < λ c and in the first levels, T v and T v have the same structure and external fields for corresponding pairs of vertices, then Theorem 5. Let (β, γ) be two parameters such that βγ > 1 and β ≤ 1 < γ. Let G = (V, E) be a graph with n many vertices, and let π be a field on G. To show Theorem 4 and Theorem 5, we will apply Lemma 2.9 and Lemma 2.10. Essentially we only need to show the existence of a universal potential function.
Proof of Theorems 4 and 5. We claim that Φ 2 (x) is a universal potential function for any field π with an upper bound λ, with contraction ratio α λ given above and base M that will be determined shortly. Theorem 4 and Theorem 5 follow from Φ 2 (x) combined with Lemma 2.9 and 2.10, respectively. We verify the two conditions in Definition 2.8. For Condition 1, it is easy to see that in case (4), ϕ 2 (x) = 1 t for any x ∈ (0, λ], and in case (5), e λ ≤ ϕ 2 (x) ≤ 1 t for any x ∈ (0, λ]. For Condition 2, we have that (by (6)) Moreover, F d (x) < λ βλ+1 λ+γ d for any x i ∈ (0, λ], and βλ+1 λ+γ < 1 by Lemma 3.1. Then there exists Hence, for any d > d 0 , Therefore, there exists an integer M ≥ d 0 such that for any . Condition 2 holds.

Heuristics behind Φ 2 (x)
The most intricate part of our proofs of Theorem 4 and Theorem 5 is the choice of the potential function Φ 2 (x) given by (5). Here we give a brief heuristic of deriving it. It is more of an "educated guess" than a rigorous argument.
We want to pick Φ 2 (x) such that Condition 2 holds. In particular, we want It is fair to assume that the left hand side of the equation above takes its maximum when all x i 's are equal. Hence, we hope the following to hold where f d (x) = λ βx+1 x+γ d is the symmetrized version of F d (x). We will use z := f d (x) to simplify notation. Since we want (8) to hold for all degrees d, we hope to eliminate d from the left hand side of (8). Notice that ϕ 2 (x) should be independent from d. Therefore, we take the derivative of ϕ 2 (f d (x))f d (x) against d and get We may achieve our goal of eliminating d by imposing the sum in the last parenthesis to be 0, namely From (9), it is easy to see that ϕ 2 (x) = 1 x log λ x satisfies our need. To get the full definition of (5), we apply a thresholding trick to keep ϕ 2 (x) bounded away from 0.

Discussion of the β > 1 case
We cannot combine conditions of Theorem 4 and Theorem 5 together to have an FPTAS. In particular, when β > 1 strong special mixing fails for any λ even if λ < λ c . To see this, given a ∆-ary tree T , we can append t many children to every vertex in T to get a new tree T and impose a partial configuration σ where all these new children are pinned to 0. Effectively, the tree T is equivalent to T where every vertex has a new external field of λβ t , which is larger than λ int c if t is sufficiently large regardless of λ. Then by Proposition 2.4, long range correlation exists in T with the partial configuration σ, and strong spatial mixing fails.
On the other hand, it is easy to see from the proof that, Theorem 4 can be generalized to allow a partial configuration σ on some subset Λ where the marginal probability of every vertex v ∈ Λ satisfies p σ v ≤ λc λc+1 . This is not the case for the SAW tree which our algorithm relies on when β > 1. However, the following observation shows that if λ v ≤ λ c ≤ γ−1 β−1 , then the marginal probability of any instance G satisfies this requirement. Thus, it seems the only piece missing to obtain an algorithm is to design a better recursion tree instead of the SAW tree.
Proposition 3.5. Let (β, γ) be two parameters such that 1 ≤ β ≤ γ and βγ > 1. Let λ ≤ γ−1 β−1 be another parameter. For any graph To prove this proposition, we need to use the random cluster formulation of 2-spin models. Let G be a graph and e = (v 1 , v 2 ) be one of its edges. Let G + be the graph where the edge e is contracted, and G − be the graph where e is removed. Moreover, in G + , we assign π where v is the vertex obtained from contacting e. Then we have that where we write Z(G) instead of Z β,γ,π (G) to simplify the notation. To show the equation above we only need a simple adapation of the random cluster formulation of the Ising model to the 2-spin setting.
Proof of Proposition 3.5. Suppose G = (V, E) where |V| = n and |E| = m. We show the claim by inducting on (m, n). Clearly the statement holds when m = 0 or n = 1. Hence we may assume the claim holds for (m , n) where m < m as well as (m , n ) where n < n, and show that the claim holds for (m, n). Pick an arbitrary edge e = (v 1 , v 2 ) in G. Let G + and G − be as in the random cluster formulation. It is easy to see that π( v) = λ v 1 λ v 2 β−1 γ−1 ≤ λ. Hence both G + and G − satisfy the induction hypothesis. It implies that p G − ;v ≤ λ λ+1 for any v, where p G − ;v is the mariginal probability of v in G − . Moreover, p G + ;v ≤ λ λ+1 for any v ∈ V + , where V + is the vertex set of G + . Let δ be a mapping V → V + such that δ(v) = v if v = v 1 , v 2 and δ(v 1 ) = δ(v 2 ) = v. Then using (10) we have that for any vertex v ∈ V, where in the last line we use the induction hypotheses.

Correlation Decay Beyond λ c
Let β, γ be two parameters such that β ≤ 1 < γ and βγ > 1. In this section we give an example to show that if ∆ c is not an integer, then correlation decay still holds for a small interval beyond λ c .
To simplify the presentation, we assume that π is a uniform field such that π(v) = λ. Note that the potential function ϕ 2 (x) does not extend beyond λ c .
Define a constant t as We consider the potential function Φ 3 (x) so that ϕ 3 (x) := 1 x(log(1+1/x)+t) . With this choice, We do a change of variables. Let Furthermore, let s i = log r i . As r i ∈ 1 γ , β , s i ∈ (− log γ, log β). Let Then ρ(x) is concave for any x ∈ (− log γ, log β). It can be easily verified, as the second derivative is where in the last line we used (11) and the fact that 1/γ ≤ e x ≤ β. Hence, by concavity, we have that for any x i ∈ (0, λ], where x > 0 is the unique solution such that f d ( x) = F d (x). Next we show that there exists an α < 1 such that for any integer d and x > 0, c ϕ 3 ,d (x) < α. In fact, by (11), our choice of t, it is not hard to show that the maximum of c ϕ 3 ,d (x) is achieved at x = γ/β and d = ∆ c , which is 1 if λ = λ c and is larger than 1 if λ > λ c . However, since the degree d has to be an integer, we can verify that for any integer 1 ≤ d ≤ 100, the maximum of c ϕ 3 ,d (x) is c ϕ 3 ,22 (x 22 ) = 0.999983 where x 22 ≈ 1.83066. If d > 100, then for any x > 0, and C 1 < 0.481875 is the maximum of log(1+λ −1 β −d )+t for any d > 100. Then, due to (13), we have that for any x i ∈ (0, λ], C ϕ 3 ,d (x) < α = 0.999983 < 1. This is the counterpart of C ϕ 2 ,d (x) < α λ in the proof of Theorem 5. To make ϕ 3 (x) satisfy Condition 1 and Condition 2 in Definition 2.8, it is sufficient to do a simple "chop-off" trick to ϕ 3 (x) as in (5). We will omit the detail here. It is easy to see that the argument above works for any β ≤ 1 < γ and βγ > 1 except (12), the concavity of ρ(x). Indeed, the concavity does not hold if, say, β = 1 and γ = 2. Nevertheless, the key point here is that λ c is not the tight bound for FPTAS. Short of a conjectured optimal bound, we did not try to optimize the potential function nor the applicable range of the proof above.

Limitations of Correlation Decay
In this section, we discuss some limitations of approximation algorithms for ferromagnetic 2-spin models based on correlation decay analysis.
The problem of counting independent sets in bipartite graphs (#BIS) plays an important role in classifying approximate counting complexity. #BIS is not known to have any efficient approximation algorithm, despite many attempts. However there is no known approximation preserving reduction (AP-reduction) to reduce #BIS from #SAT either. It is conjectured to have intermediate approximation complexity, and in particular, to have no FPRAS [DGGJ03].
We then consider fields with some constant bounds. Recall that λ int c = (γ/β) . Then λ int c = λ int c unless ∆ c is an integer. By reducing to anti-ferromagnetic 2-spin models in bipartite graphs, we have the following hardness result, which is first observed in [LLZ14, Theorem 3].
The reduction goes as follows. Anti-ferromagnetic Ising models with a constant non-trivial field in bounded degree bipartite graphs are #BIS-hard, if the uniqueness condition fails [CGG + 14]. Given such an instance, we may first flip the truth table of one side. This effectively results in a ferromagnetic Ising model in the same bipartite graph, with two different fields on each side. By a standard diagonal transformation, we can transform such an Ising model to any ferromagnetic 2-spin model, with various local fields depending on the degree. It can be verified that for any λ > λ int c , we may pick a field in the anti-ferromagnetic Ising model to start with, such that uniqueness fails and after the transformation, the largest field in use is at most λ.
We note that λ c is not the tight bound for FPTAS, as observed in Proposition 4.1. Since the degree d has to be an integer, with an appropriate choice of the potential function, there is a small interval beyond λ c such that strong spatial mixing still holds. Interestingly, it seems that λ int c is not the right bound either. Let us make a concrete example. Let β = 1 and γ = 2. Then Hence λ c ≈ 10.6606 and λ int c = (2) 6+1 2 ≈ 11.3137. However, even if λ < λ int c , the system may not exhibit spatial mixing, neither in the strong nor in the weak sense. In fact, even the spatial mixing in the sense of Theorem 1 does not necessarily hold if λ < λ int c . To see this, we take any λ ∈ [10.9759, 10.9965] so that λ c < λ < λ int c . Consider an infinite tree where at even layers, each vertex has 5 children, and at odd layers, each vertex has 7 children. There are more than one Gibbs measures in this tree. This can be easily verified from the fact that the two layer recursion function f 5 (f 7 (x)) has three fixed points such that x = f 5 (f 7 (x)). In addition, all three fixed points x i satisfy that x i < λ c for i = 1, 2, 3. Consider a tree T with alternating degrees 5 and 7 of depth 2 , and another tree T of the same structure in the first 2 layers as T but with one more layer where each vertex has, say, 50 children. It is not hard to verify that as increases, the marginal ratio at the root of T converges to x 3 , but the ratio at the root of T converges to x 1 . This example indicates that one should not expect correlation decay algorithms to work all the way up to λ int c . At last, if we consider the uniform field case #2SPIN(β, γ, λ), then our tractability results still holds. However, to extend the hardness results as in Proposition 5.1 from an interval of fields to a uniform one, there seems to be some technical difficulty. Suppose we want to construct a combinatorial gadget to effectively realize another field. There is a gap between λ and the next largest possible field to realize. This is why in [LLZ14], there are some extra conditions transiting from an interval of fields to the uniform case. The observation above about the failure of SSM in irregular trees may suggest a random bipartite construction of uneven degrees. However, to analyze such a gadget is beyond the scope of the current paper.

Missing Proofs
At last, we gather technical details and proofs that are omitted in Section 2.2, Section 2.3, and Section 3.2.

Details about the Uniqueness Threshold
We want to prove Propositions 2.3 and Proposition 2.4. Technically by only considering the symmetric recursion f d (x) = λ βx+1 x+γ d , we are implicitly assuming uniform boundary conditions. If there are more than one fixed points for f d (x), then clearly there are multiple Gibbs measures. Hence, f d (x) having only one fixed point is a necessary condition for the uniqueness condition in T d+1 . Moreover, it is also sufficient. The reason is that the influence on the root of an arbitrary boundary condition is bounded between those of the all "0" and all "1" boundary conditions.
First do some calculation here. Take the derivative of f d (x): Then take the second derivative: Therefore, at is concave and therefore has only one fixed point.
Since f d (x) has only one inflection point, there are at most three fixed points. Moreover, the uniqueness condition is equivalent to say that for all fixed points (14): .
Recall that ∆ c := √ βγ+1 √ βγ−1 . If d < ∆ c , we have that for any x, In particular, f d ( x d ) < 1 for any fixed point x d and the uniqueness condition holds. This proves Proposition 2.3. To show Proposition 2.4, we may assume that d ≥ ∆ c . We may also assume that β ≤ γ. The equation (βx + 1)(γ + x) = d(βγ − 1)x has two solutions, which are Notice that both of them are positive since x 0 + x 1 = 2x * > 0 and x 0 x 1 = β/γ.
We show that f d (x 0 ) > x 0 or f d (x 1 ) < x 1 is equivalent to the uniqueness condition. First we assume this condition doesn't hold, that is f d (x 0 ) ≤ x 0 and f d (x 1 ) ≥ x 1 . If any of the equation holds, then x 0 or x 1 is a fixed point and the derivative is 1. So we have non-uniqueness. Otherwise, we have f d (x 0 ) < x 0 and f d (x 1 ) > x 1 . Since x 0 < x 1 , there is some fixed point x satisfying f d ( x) = x and x 0 < x < x 1 . The second inequality implies that (β x + 1)( x + γ) < d(βγ − 1) x. Therefore f d ( x) > 1 and non-uniqueness holds.
To show the other direction, if f d (x 0 ) > x 0 , then Assume for contradiction that f d (x) has three fixed points, denoted by x 0 < x 1 < x 2 . Then the middle fixed point x 1 satisfies f d ( x 1 ) > 1. Therefore x 1 > x 0 and there are two fixed points larger than x 0 . However, for Hence there is no fixed point in this interval. For x > x * , the function is concave and has exactly one fixed point. So there is only 1 fixed point larger than x 0 . Contradiction. The case that f d (x 1 ) < x 1 is similar. These two conditions could be rewritten as and Notice that the right hand side has nothing to do with λ in both (15) and (16). We want to see how conditions (15) and (16) change as d changes. Treat d as a continuous variable. Define where i = 0, 1 and x i is defined above depending on β, γ and d. Take the derivative: If β ≤ 1 these two functions are increasing in d.
. Thus if λ < λ int c , (16) holds for all integers d. On the other hand, Hence there is no λ such that (15) holds for all integers d. This proves Proposition 2.4. If β > 1, then neither (15) nor (16) can hold for all integers d. The reason is as d(βγ − 1)x 0 < γ for sufficiently large d, and as β 2 x 1 > d(βγ − 1) for sufficiently large d.

Details about the Potential Method
In this section we provide missing details and proofs in Section 2.3.
To study correlation decay on trees, we use the standard recursion given in (2). Recall that T is a tree with root v. Vertices v 1 , . . . , v d are d children of v, and T i is the subtree rooted by v i . A configuration σ Λ is on a subset Λ of vertices, and R σ T denote the ratio of marginal probabilities at v given a partial configuration σ on T .
We want to study the influence of another set of vertices, say S, upon v. In particular, we want to study the range of ratios at v over all possible configurations on S. To this end, we define the lower and upper bounds as follows. Notice that as S will be fixed, we may assume that it is a subset of Λ.
Definition 6.1. Let T, v, Λ, σ Λ , S, R σ T be as above. Define R v := min τ Λ R τ Λ T and R v := max τ Λ R τ Λ T , where τ λ can only differ from σ Λ on S. Define Our goal is thus to prove that δ v ≤ exp(−Ω(dist(v, S))). We can recursively calculate R v and R v as follows. The base cases are: is fixed to be blue (or green), and δ v = 0; 3. v ∈ Λ and v is the only node of T , in which case R v = R v = λ and δ v = 0.
For v ∈ Λ, since F d is monotonically increasing with respect to any x i for any βγ > 1, where R v i and R v i are recursively defined lower and upper bounds of R τ Λ T i for 1 ≤ i ≤ d. Our goal is to show that δ v decays exponentially in the depth of the recursion under certain conditions such as the uniqueness. A straightforward approach would be to prove that δ v contracts by a constant ratio at each recursion step. This is a sufficient, but not necessary condition for the exponential decay. Indeed there are circumstances that δ v does not necessarily decay in every step but does decay in the long run. To amortize this behaviour, we use a potential function Φ(x) and show that the correlation of a new recursion decays by a constant ratio.
To be more precise, the potential function Φ : R + → R + is a differentiable and monotonically increasing function. It maps the domain of the original recursion to a new one. Let y i = Φ(x i ). We want to consider the recursion for y i 's. The new recursion function, which is the pullback of F d , is defined as The relationship between F d (x) and G d (y) is illustrated in Figure 1. x y Figure 1: Commutative diagram between F d and G d .
We want to prove Lemma 2.6 and Lemma 2.10. To do so, we also define the upper and lower bounds of y. Define y v = Φ(R v ) and accordingly Let ε v = y v − y v . For a good potential function, exponential decay of ε v is sufficient to imply that of δ v .
Lemma 6.2. Let Φ(x) be a good potential function for the field λ at v. Then there exists a constant C such that δ v ≤ Cε v for any dist(v, S) ≥ 2.
Proof. By (17) and the Mean Value Theorem, there exists an R ∈ [R v , R v ] such that Since dist(v, S) ≥ 2, we have that R v ≥ λγ −d and R v ≤ λβ d . Hence R ∈ [λγ −d , λβ d ], and by Condition 1 of Definition 2.5, there exists a constant C 1 such that ϕ( R) ≥ C 1 . Therefore δ v ≤ 1/C 1 ε v .
The next lemma explains Condition 2 of Definition 2.5. Lemma 6.3. Let Φ(x) be a good potential function with contraction ratio α. Then, Proof. First we use (17): Let y 1 = (y v 1 , . . . , y v d ) and y 0 = (y v 1 , . . . , y v d ). Let z(t) = ty 1 + (1 − t)y 0 be a linear combination of y 0 and y 1 where t ∈ [0, 1]. Then we have that By the Mean Value Theorem, there exist t such that Then we have that It is straightforward to calculate that where R i = Φ −1 (y i ) and y and R are vectors composed by y i 's and R i 's. Plugging (20) into (19) we get that where R i = Φ −1 ( y i ), R is the vector composed by R i 's, and in the last line we use Condition 2 of Definition 2.5.
Note that the two conditions of a good potential function does not necessarily deal with all cases in the tree recursion. At the root we have one more child than other vertices in a SAW tree. Also, if v has a child u ∈ S, then ε u = ∞ and the range in both conditions of Definition 2.5 does not apply. To bound the recursion at the root, we have the following straightforward bound of the original recursion.
Lemma 6.4. Let (β, γ) be two parameters such that βγ > 1 and β < γ. Let v be a vertex and v i be its children for 1 ≤ i ≤ d. Suppose δ v i ≤ C for some C > 0 and all 1 ≤ i ≤ d. Then, Proof. It is easy to see that γ ≥ 1. By the same argument as in Lemma 6.3 and (2), there exists x i 's such that where x is the vector composed by x i 's. Then, we have that where we use the fact that F d (x) ≤ λ v β d for any x i ∈ [0, ∞) and βγ > 1. The lemma follows. Now we are ready to prove Lemma 2.6.
Proof of Lemma 2.6. Given G and a partial configuration σ Λ on a subset Λ ⊆ V of vertices, we first claim that we can approximate p σ Λ v within additive error ε deterministically in time O ε log ∆ log α . We construct the SAW tree T = T SAW (G, v). Due to Proposition 2.1, we only need to approximate p σ Λ v in T , with respect to v and an arbitrary vertex set S. We will also use σ Λ to denote the configuration in T on Λ SAW . Let S be the set of vertices whose distance to v is larger than t, where t is a parameter that we will specify later. Let δ v be defined as in Definition 6.1 with respect to T , v, Λ, σ Λ , and S. We want to show that δ v = O(λα t ).
The maximum degree of T is at most ∆. Thus the root v has at most ∆ children in T , and any other vertex in T has at most ∆ − 1 children. Assume v has k ≥ 1 children as otherwise we are done. We may also assume that v ∈ S and let t = dist(v, S) − 1 ≥ 1. We recursively construct a path u 0 = v, u 1 ,. . . ,u l of length l ≤ t as follows. Given u i , if there is no child of u i , then we stop and let l = i. Otherwise u i has at least one child. If i = t then we stop and let l = t. Otherwise l < t and let u i+1 be the child of u i such that ε u i+1 takes the maximum ε among all children of u i . In other words, by Lemma 6.3, we have that for all 1 ≤ i ≤ l − 1. Notice that (21) may not hold for i = 0 since v = u 0 has possibly ∆ children.
First we note that for all 1 ≤ i ≤ l, dist(v, u i ) = i ≤ l ≤ t, and therefore u i ∈ S. If we met any vertex u l with no child, then we claim that ε u l = 0. This is because u l is either a free vertex with no child or u l ∈ Λ but u l ∈ S. However since ε u l takes the maximum ε among all children of u l−1 , we have that for all children of u i−1 , ε = 0, which implies that ε u i−1 = 0. Recursively we get that ε v = ε u 0 = 0 and clearly the theorem holds by (18).
Hence we may assume that l = t. Since u l ∈ S, we have that δ u l ≤ λ u l β −(∆−1) if β > 1, or δ u l ≤ λ u l if β ≤ 1. Hence by (18) and Condition 1 in Definition 2.5, we have that ε u l ≤ C 0 for some constant C 0 . Applying (21) inductively we have that ε u 1 ≤ α l ε u l ≤ α t C 0 .
Hence by Lemma 6.2, we there exists another constant C 1 such that δ u 1 ≤ α t C 1 . To get a bound on δ u 0 , we use Lemma 6.4, which states that where d 0 ≤ ∆ is the degree of v = u 0 .
Hence the recursive procedure returns R v and R v such that . Let p 0 = Rv Rv+1 and p 1 = R v R v +1 . Then p 0 ≤ p σ Λ v ≤ p 1 and The recursive procedure runs in time O(∆ t ) since it only needs to construct the first t levels of the self-avoiding walk tree. For any ε > 0, let t = O(log α ε − log α λ) so that R v − R v < ε. This gives an algorithm which approximates p σ Λ v within an additive error ε in time O ε λ log ∆ log α .
Then we use self-reducibility to reduce computing Z β,γ,π (G) to computing conditional marginal probabilities. To be specific, let σ be a configuration on a subset of V and τ be sampled according to the Gibbs measure. Let p σ v := Pr (τ(v) = 1 | σ) be the conditional marginal probability. We can compute Z β,γ,π (G) from p σ v by the following standard procedure. Let v 1 , . . . , v n enumerate vertices in G. For 0 ≤ i ≤ n, let σ i be the configuration fixing the first i vertices v 1 , . . . , v i as follows: σ i (v j ) = σ i−1 (v j ) for 1 ≤ j ≤ i − 1 and σ i (v i ) is fixed to the spin s so that p i := Pr (τ(v i ) = s | σ i−1 ) ≥ 1/3. This is always possible because clearly Pr (τ(v i ) = 0 | σ i−1 ) + Pr (τ(v i ) = 1 | σ i−1 ) = 1.
In particular, σ n ∈ {0, 1} V is a configuration of V. The Gibbs measure of σ n is ρ(σ n ) = w(σn) Z β,γ,π (G) . On the other hand, we can rewrite ρ(σ n ) = p 1 p 2 · · · p n by conditional probabilities. Thus Z β,γ,π (G) = w(σn) p 1 p 2 ···pn . The weight w(σ n ) given in (1) can be computed exactly in time polynomial in n. Note that p i equals to either p σ i−1 v i or 1 − p σ i−1 v i . Since we can approximate p σ Λ v within an additive error ε in time O ε λ log ∆ log α , the configurations σ i can be efficiently constructed, which guarantees that all p i 's are bounded away from 0. Thus the product p 1 p 2 · · · p n can be approximated within a factor of (1 ± nε ) in time O n ε λ log ∆ log α . Now let ε = ε n . We get the claimed FPTAS for Z β,γ,π (G).
Lemma 2.9 follows almost immediately from Lemmas 6.2, 6.3, and 6.4 as in the proof above. The only issue is that the range of x should be restricted to (0, λ]. This is guaranteed by Claim 3.3.
Finally we show Lemma 2.10.
If 0 < x < γ β , it is sufficient to verify that g (x) > 0. We only need to show that g (x) is decreasing since g γ β = 0. It is easily verified by taking the derivative again: where the last inequality uses the fact that x+γ βx+1 ≥ 1 by Lemma 3.1 and x < γ β . If γ β < x < λ c , then we show (3) directly. First notice that as x = γ β , Given this, in order to get (3), it is sufficient to show that h(x) < 0 where In fact, h(x) is a decreasing function as Notice that h γ β = 0. It implies that h(x) < 0 for all x > γ β . This completes the proof.