Approximation via Correlation Decay When Strong Spatial Mixing Fails

Approximate counting via correlation decay is the core algorithmic technique used in the sharp delineation of the computational phase transition that arises in the approximation of the partition function of anti-ferromagnetic two-spin models. Previous analyses of correlation-decay algorithms implicitly depended on the occurrence of strong spatial mixing. This, roughly, means that one uses worst-case analysis of the recursive procedure that creates the sub-instances. In this paper, we develop a new analysis method that is more refined than the worst-case analysis. We take the shape of instances in the computation tree into consideration and we amortise against certain " bad " instances that are created as the recursion proceeds. This enables us to show correlation decay and to obtain an FPTAS even when strong spatial mixing fails. We apply our technique to the problem of approximately counting independent sets in hypergraphs with degree upper-bound ∆ and with a lower bound k on the arity of hyperedges. Liu and Lin gave an FPTAS for k ≥ 2 and ∆ ≤ 5 (lack of strong spatial mixing was the obstacle preventing this algorithm from being generalised to ∆ = 6). Our technique gives a tight result for ∆ = 6, showing that there is an FPTAS for k ≥ 3 and ∆ ≤ 6. The best previously-known approximation scheme for ∆ = 6 is the Markov-chain simulation based FPRAS of Bordewich, Dyer and Karpinski, which only works for k ≥ 8. Our technique also applies for larger values of k, giving an FPTAS for k ≥ 1.66∆. This bound is not as strong as existing randomised results, for technical reasons that are discussed in the paper. Nevertheless, it gives the first deterministic approximation schemes in this regime. We further demonstrate that in the hypergraph independent set model, approximating the partition function is NP-hard even within the uniqueness regime.


Introduction
We develop a new method for analysing correlation decays in spin systems. In particular, we take the shape of instances in the computation tree into consideration and we amortise against certain "bad" instances that are created as the recursion proceeds. This enables us to show correlation decay and to obtain an FPTAS even when strong spatial mixing fails. To the best of our knowledge, strong spatial mixing is a requirement for all previous correlation-decay based algorithms. To illustrate our technique, we focus on the computational complexity of approximately counting independent sets in hypergraphs, or equivalently on counting the satisfying assignments of monotone CNF formulas.
The problem of counting independent sets in graphs (denoted #IS) is extensively studied. A beautiful connection has been established, showing that approximately counting independent sets in graphs of maximum degree ∆ undergoes a computational transition which coincides with the uniqueness phase transition from statistical physics on the infinite ∆-regular tree. The computational transition can be described as follows. Weitz [20] designed an FP-TAS for counting independent sets on graphs with maximum degree at most ∆ = 5. On the other hand, Sly [17] proved that there is no FPRAS for approximately counting independent sets on graphs with maximum degree at most ∆ = 6 (unless NP = RP). The same connection has been established in the more general context of approximating the partition function of the hard-core model [20,14,17,5,7,18] and in the even broader context of approximating the partition functions of generic antiferromagnetic 2-spin models [16,7,18,11] (which includes, for example, the antiferromagnetic Ising model). As a consequence, the boundary for the existence of efficient approximation algorithms for these models has been mapped out 1 .
Approximate counting via correlation decay is the core technique in the algorithmic developments which enabled the sharp delineation of the computational phase transition. Another standard approach for approximate counting, namely Markov chain Monte Carlo (MCMC) simulation, is also conjectured to work up to the uniqueness threshold, but the current analysis tools that we have do not seem to be powerful enough to show that. For example, sampling independent sets via MCMC simulation is known to have fast mixing only for graphs with degree at most 4 [3,4], rather than obtaining the true threshold of 5. In this work, we consider counting independent sets in hypergraphs with upper-bounded vertex degree, and lower-bounded hyperedge size. A hypergraph H = (V, F) consists of a vertex set V and a set F of hyperedges, each of which is a subset of V . A hypergraph is said to be k-uniform if every hyperedge contains exactly k vertices. Thus, a 2-uniform hypergraph is the same as a graph. We will consider the more general case where each hyperedge has arity at least k, rather than exactly k.
An independent set in a hypergraph H is a subset of vertices that does not contain a hyperedge as a subset. We will be interested in computing Z H , which is the total number of independent sets in H (also referred to as the partition function of H). Formally, the problem of counting independent sets has two parameters -a degree upper bound ∆ and a lower bound k on the arity of hyperedges. The problem is defined as follows. 2 Name #HyperIndSet(k, ∆).
Instance A hypergraph H with maximum degree at most ∆ where each hyperedge has cardinality (arity) at least k.
Output The number Z H of independent sets in H.
Previously, #HyperIndSet(k, ∆) has been studied using the MCMC technique by Borderwich, Dyer, and Karpinski [1,2] (see also [4]). They give an FPRAS for all k ≥ ∆ + 2 ≥ 5 and for k ≥ 2 and ∆ = 3. Despite equipping path coupling with optimized metrics obtained using linear programming, these bounds are not tight for small k. Liu and Lu [12] showed that there exists an FPTAS for all k ≥ 2 and ∆ ≤ 5 using the correlation decay technique.
Thus, the situation seems to be similar to the graph case -given the analysis tools that we have, correlation-decay brings us closer to the truth than the best-tuned analysis of MCMC simulation algorithms. On the other hand, the technique of Liu and Lu [12] does not extend beyond ∆ = 5. To explain the reason why it does not, we need to briefly describe the correlation-decay-based algorithm framework introduced by Weitz [20]. The main idea is to build a recursive procedure for computing the marginal probability that any given vertex is in the independent set. The recursion works by examining sub-instances with "boundary conditions" which require certain vertices to be in, or out, of the independent set. The recursion structure is called a "computation tree". Nodes of the tree correspond to intermediate instances, and boundary conditions are different in different branches. The computation tree allows one to compute the marginal probability exactly but the time needed to do so may be exponentially large since, in general, the tree is exponentially large. Typically, an approximate marginal probability is obtained by truncating the computation tree to logarithmic depth so that the (approximation) algorithm runs in polynomial time. If the correlation between boundary conditions at the leaves of the (truncated) computation tree and the marginal probability at the root decays exponentially with respect to the depth, then the error incurred from the truncation is small and the algorithm succeeds in obtaining a close approximation.
All previous instantiations under this framework require a property called strong spatial mixing (SSM) 3 , which roughly states that, conditioned on any boundary condition on intermediate nodes, the correlation decays. In other words, SSM guards against the worst-case boundary conditions that might be created by the recursive procedure.
Observation 1 follows from the fact that the infinite (∆−1)-ary tree T 2,∆ can be embedded in the hypertree T k,∆ , and from well-known facts about the phase transition on T 2,∆ .
Observation 1 prevents the generalisation of Liu and Lu's algorithm [12] so that it applies for ∆ ≥ 6, even with an edge-size lower bound k. The problem is that the construction of the computation tree involves constructing intermediate instances in which the arity of a hyperedge can be as small as 2. So, even if we start with a k-uniform hypergraph, the computation tree will contain instances with small hyperedges. Without strong spatial mixing, these small hyperedges cause problems in the analysis. Lu, Yang and Zhang [13] discuss this problem and say "How to avoid this effect is a major open question whose solution may have applications in many other problems." This question motivates our work.
To overcome this difficulty, we introduce a new amortisation technique in the analysis. Since lack of correlation decay is caused primarily by the presence of small-arity hyperedges within the intermediate instances, we keep track of such hyperedges. Thus, we track not only the correlation, but also combinatorial properties of the intermediate instances in the computation tree. Using this idea, we obtain the following result.
Note that #HyperIndSet (2,6) is NP-hard to approximate due to [17], so our result is tight for ∆ = 6. This also shows that ∆ = 6 is the first case where the complexity of approximately counting independent sets differs on hypergraphs and graphs, as for ∆ ≤ 5 both admit an FPTAS [12]. Moreover, Theorem 2 is stronger than the best MCMC algorithm [2] when ∆ = 6 as [2] only works for k ≥ 8.
We also apply our technique to large k.
Theorem 3. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. Then there is an FPTAS for the problem #HyperIndSet(k, ∆).
In the large k case, our result is not substantially stronger than that obtained by analysis of the MCMC algorithm [2] (k ≥ ∆ rather than k ≥ ∆ + 2) but it is incomparable since our algorithm is deterministic rather than randomised. Perhaps more importantly, the bound k ≥ ∆ allows us to connect the problem of counting independent sets in hypergraphs with the problem of counting dominating sets in ∆-regular graphs and show that the latter admits an FPTAS when ∆ is sufficiently large. Recall that a dominating set in a graph G is a subset S of the vertices such that every vertex not in S is adjacent to at least one vertex in S. We then consider the following problem.
Instance A ∆-regular graph G.
Output The number of dominating sets in G.
Our theorems have the following corollary.
We remark that Corollary 4 cannot be obtained using the result of [2], i.e., the seemingly small difference between k ≥ ∆ and k ≥ ∆ + 2 does matter in deriving Corollary 4 from Theorems 2 and 3. We should also emphasise that it is necessary to consider ∆-regular graphs as inputs to the dominating set problem, since otherwise for graphs of maximum degree ∆ (not necessarily regular), we show that the problem is NP-hard to approximate for ∆ ≥ 18 (Theorem 52). It is relevant to remark here that we believe that Corollary 4 should hold for all ∆; to do this, it would be sufficient to remove the restriction ∆ ≥ 200 from Theorem 3. Note that, while we do not know how to remove the restriction, it would at least be possible to improve "200" to some smaller number. However, we have chosen to stick with "200" in order to keep the proof accessible. We explain next the difficulties in obtaining Theorems 2 and 3.
The main technical difficulty in correlation-decay analysis is bounding a function that we call the "decay rate". This boils down to solving an optimisation problem with (k − 1)(∆ − 1) variables. In previous work (e.g. [15]), this optimisation has been solved using a so-called "symmetrization" argument, which reduces the problem to a univariate optimisation via convexity. However, the many variables represent different branches in the computation tree. Since our analysis takes the shape of intermediate instances in the tree into consideration, the symmetrization argument does not work for us, and different branches take different values at the maximum. This problem is compounded by the fact that the shape of the sub-tree consisting of "bad" intermediate instances is heavily lopsided, and the assignment of variables achieving the maximum is far from uniform. Given these problems, there does not seem to be a clean solution to the optimisation in our analysis. Instead of optimizing, we give an upper bound on the maximum decay rate. In Theorem 2, as k and ∆ are small, the number of variables is manageable, and our bounds are much sharper than those in Theorem 3. On the other hand, because of this, the proof of Theorem 3 is much more accessible, and we will use Theorem 3 as a running example to demonstrate our technique.
We also provide some insight on the hardness side. Recall that for graphs it is NP-hard to approximate #IS beyond the uniqueness threshold (∆ = 6) [17]. 4 We prove that it is NPhard to approximate #HyperIndSet (6,22) (Corollary 50). In contrast, we show that uniqueness holds on the 6-uniform ∆-regular hypertree iff ∆ ≤ 28 (Corollary 59). Thus, efficient approximation schemes cease to exist well below the uniqueness threshold on the hypertree. In fact, we show that this discrepancy grows exponentially in k: for large k, it is NP-hard to approximate #HyperIndSet(k, ∆) when ∆ ≥ 5 · 2 k/2 (Theorem 49 and Corollary 51), despite the fact that uniqueness holds on the hypertree for all ∆ ≤ 2 k /(2k) (Lemma 60). Theorem 49 follows from a rather standard reduction to the hard-core model on graphs. Nevertheless, it demonstrates that the computational-threshold phenomena in the hypergraph case (k > 2) are substantially different from those in the graph case (k = 2).
As mentioned earlier, there are models where efficient (randomised) approximation schemes exist (based on MCMC simulation) even though SSM does not hold. In fact, this can happen even when uniqueness does not hold. A striking example is the ferromagnetic Ising model (with external field). As [16] shows, there are parameter regimes where uniqueness holds but strong spatial mixing fails. It is easy to modify the parameters so that even uniqueness fails. Nevertheless, Jerrum and Sinclair [9] gave an MCMC-based FPRAS that applies for all parameters and for general graphs (with no degree bounds). It is still an open question to give a correlation decay based FPTAS for the ferromagnetic Ising model.

Outline of Paper
In Section 2, we first give some preliminaries. We give a formal definition of strong spatial mixing (Section 2.1) and a reformulation of #HyperIndSet(k, ∆) as the problem of counting satisfying assignments in monotone CNF formulas (Section 2.2). This will allow us to use the computation tree used by Liu and Lu [12]. A formal description of the computation tree of [12] is given in Section 2.3.
In Section 3, we give an overview of our proof approach, i.e., the main idea behind our new amortisation technique. Section 3.2 concludes the proof of Theorems 2 and 3, using two (not yet proved) technical lemmas (Lemma 21 and 22) which solve a complicated multivariate optimisation problem, and represent the bulk of the technical work of the paper. Section 3.3 gives the proof of Corollary 4.
In Section 4, we give the proof of Lemma 21, which applies to the large-∆ setting of Theorem 3 and is by far the technically simpler of the two lemmas. Section 5 contains the proof of the technically more challenging Lemma 22 which applies to the k = 3, ∆ = 6 setting of Theorem 2.
Section 6 gives the formal statements and proofs of the hardness results stated in the Introduction; Section 6.1 has the hardness results for independent sets in hypergraphs and Section 6.2 has the hardness results for dominating sets in graphs. Also, Section 7 studies the uniqueness threshold on the k-uniform ∆-regular hypertree (and gives the proofs of the uniqueness statements made in the Introduction). Finally, Section 8 gives the proof for several technical inequalities used in Section 5.
Throughout the paper we use computer algebra to prove multivariate polynomial inequalities over the field of real numbers (the coefficients of the polynomials are rational). More specifically, we use the Resolve command in Mathematica. The underlying quantifier elimination algorithm (described in [19]) provides a rigorous decision procedure that determines feasibility of a collection of polynomial inequalities.

Strong Spatial Mixing
For the purposes of this section, it will be convenient to view the independent set model as a 2-spin model. Namely, if H = (V, F) is a hypergraph, each independent set I can be viewed as a {0, 1}-assignment σ to the vertices in V , where a vertex v is assigned the spin 1 under σ if v ∈ I and 0 otherwise. We denote by Ω H the set of all independent sets in H. The Gibbs distribution µ H (·) is the uniform distribution over Ω H . The Gibbs distribution of H can clearly be viewed as the uniform distribution over those assignments σ : V → {0, 1} which encode a valid independent set of H. For an assignment σ : V → {0, 1} and a subset Λ ⊂ V , we denote by σ Λ the restriction of σ to the subset Λ.
For a hypergraph H = (V, F) and a subset Λ ⊂ V , we denote by H Λ the subgraph of H induced by Λ, i.e., H Λ := (Λ, e∈F (e ∩ Λ)). Also, for a vertex v ∈ V and Λ ⊂ V , we denote by dist(v, Λ) the length of the shortest path 5 between v and a vertex of Λ. 1]. The independent set model exhibits strong spatial mixing on a hypergraph H = (V, F) with decay rate δ(·) iff for every v ∈ V , for every Λ ⊂ V , for any two configurations η, η ′ : Λ → {0, 1} encoding independent sets of H Λ , it holds that where Λ ′ denotes the set of vertices in Λ such that η and η ′ differ.

Reformulation in terms of Monotone CNF formulas
The problem of counting the independent sets of a hypergraph has an equivalent formulation in terms of monotone CNF formulas. In order to describe the equivalent formulation, we first describe the problem of counting satisfying assignments of monotone CNF formulas.
A monotone CNF formula C consists of a set of variables V and a set of clauses {c 1 , c 2 , . . .}. Each clause c i is associated with some subset S i of V and is the disjunction of all variables in S i . The arity of a clause c i , denoted |c i |, is defined to be |S i |. For a variable x ∈ V , its degree d x (C) is the number of clauses where x appears. The maximum degree of C is given by max x∈V d x (C). Definition 6. Let C k,∆ be the set of all monotone CNF formulas which have maximum degree at most ∆ and whose clauses have arity at least k.
Note that a formula in C k,∆ may have some clauses with arbitrarily large arities. A satisfying assignment of the formula is an assignment of truth values to the variables which makes the formula evaluate to "true".
Suppose that H = (V, F) is a hypergraph with maximum degree at most ∆ where each hyperedge has arity at least k. Let C be the corresponding formula in C k,∆ with variable set V . The correspondence is that each hyperedge S i of H is associated with exactly one clause c i of C. Independent sets of H are in one-to-one correspondence with satisfying assignments of C -a variable is assigned value "true" in an assignment if and only if it is out of the corresponding independent set.
Going the other direction, any monotone CNF formula can be viewed as a hypergraph. In the technical sections of this paper, we use the monotone CNF formulation.
In this article, when we consider a monotone CNF formula C we will typically use n to denote |V |. Variables in V will be denoted by x 1 , x 2 , . . . When x and C are clear from context, we will sometimes use d to denote d x (C). When C is clear from context, we will sometimes use ∆ to denote max x∈V d x (C).

The Computation Tree
In this section, we set up relevant notation and give an exposition of the computation tree of Liu and Lu [12] which will also be used in our proof (though our analysis will be different). The computation tree of Liu and Lu is given in terms of the monotone CNF version of the problem. Below we give the relevant definitions and notation; our notation aligns as much as possible with that of [12].
Our goal is to approximately count the number of satisfying assignments of a formula C ∈ C k,∆ , which we denote by Z(C). Since C is monotone, an assignment σ : V → {0, 1} is satisfying if, for every clause in C, there is at least one variable x ∈ c with σ(x) = 1. Note that Z(C) > 0 since the all-1 assignment satisfies every monotone CNF formula. For convenience, we will use the simplified notation "x = 1" to denote (the set of) satisfying assignments of C in which x is set to 1, and we similarly use "x = 0". We associate the formula C with a probability distribution in which each satisfying assignment has probability mass 1/Z(C). We will denote probabilities with respect to this distribution by Pr C (·).
Let x be a variable in V . Define R(C, x) := Pr C (x=0) Pr C (x=1) , this is well-defined since Pr C (x = 1) > 0 by the monotonicity of C. In fact, the monotonicity of C also implies that 0 ≤ R(C, x) ≤ 1, where the upper bound follows from the fact that, for every satisfying assignment with x = 0, flipping the assignment of x to 1 does not affect satisfiability. Our interest in the quantity R(C, x) stems from the following simple lemma from [12]. Lemma 7 ([12]). Let k and ∆ be positive integers. Suppose that there is a polynomial-time algorithm (in n and 1/ε) that takes an n-variable formula C ∈ C k,∆ , a variable x of C, and an ε > 0 and computes a quantity R(C, x) satisfying | R(C, x) − R(C, x)| ≤ ε. Then, there is an FPTAS which approximates Z(C) for every C ∈ C k,∆ .
Proof. The proof is actually identical to the argument in [12,Appendix A]. We include the proof for completeness, and also because an examination of the proof is necessary to check that the FPTAS for approximating Z(C) invokes the algorithm that computes R(C, x) only on formulas C whose clauses have arity at least k (and maximum degree ∆).
Let ε > 0 and C be a monotone CNF formula C with maximum degree ∆ whose clauses have arity at least k. Let x 1 , . . . , x n be the variables in C. Let C i be the formula obtained from C by setting x 1 = · · · = x i = 1 and removing all the clauses that are satisfied (i.e., all clauses that contain a variable from x 1 , . . . , x i ). We have Note that every C i is a monotone CNF formula with maximum degree ∆ whose clauses have arity at least k. By the assumption in the lemma, we can compute (in poly(n, 1/ε) time) quantities R(C i , x i+1 ) such that Let It is not hard to conclude from (1), (2) and (3) . This completes the proof.
Liu and Lu [12] established that a computation tree approach gives a recursive procedure for exactly calculating R(C, x) for any monotone CNF formula C and any variable x ∈ C. We next give the details of this recursive procedure (see [12,Lemma 5]). First, we introduce the following definitions.
Definition 8. Let C be a monotone CNF formula and let x be a variable in C. We call the variable x forced (in C) if x appears in a clause of arity 1 in C (note that in every satisfying assignment of C it must be the case that x = 1 and hence R(C, x) = 0). We call the variable x free if x does not appear in any clause of C (note that R(C, x) = 1 in this case).
Definition 9. Let C be a monotone CNF formula and let c be a clause in C. We call the clause c redundant (in C) if there is a clause c ′ in C such that c is a (strict) superset of c ′ (note that removing c from C does not affect the set of satisfying assignments of C).
We next give the details of the computation tree. The nodes in the computation tree will be pairs (C, x) such that C is a monotone CNF formula and x is a variable which is not forced in C.
Let (C, x) satisfy (4). We first perform a pre-processing step on C which involves (i) initially removing all of the redundant clauses, (ii) then, removing all clauses of arity 1. Note that part (ii) of the preprocessing step removes all forced variables that were present in C; at the time of the removal, forced variables appear only in clauses of arity 1 since part (i) of the preprocessing step has already removed all redundant clauses in C (and hence all clauses of arity greater than 1 that contain forced variables). Denote the formula after the completion of the preprocessing step by C. Note that every clause in C is also a clause in the initial formula C. It follows that x is not forced in C. Further, since removing redundant clauses does not change the set of satisfying assignments of C and x is not forced in C, we have that . If x is free in C (the formula after the pre-processing step), then the start node (C, x) is (declared) a leaf of the computation tree (note that in this case R(C, x) = 1). In the sequel, we assume that x is not free in C. Denote by {c i } i∈[d] the clauses where x occurs in C and let w i = |c i | − 1 (note that d ≥ 1). We will use w to denote the vector (w 1 , . . . , w d ). The variables in clause c i other than x will be denoted by x i,1 , . . . , x i,w i . For the pair (C, x), we next construct pairs (C i,j , x i,j ) for i ∈ [d] and j ∈ [w i ], where C i,j is an appropriate subformula obtained from C, roughly, by hard-coding (some of) the occurrences of the variables in C to either 1 or 0 (this will be explained below and will henceforth be referred to as pinning) 6 .
Precisely, for i ∈ [d], let C i be the formula obtained from C by removing clauses c 1 , . . . , c i−1 (note that this has the same effect as pinning the occurrences of x in these clauses to 1) and pinning the occurrences of x in c i+1 , . . . , c d to 0 (this corresponds to removing x from these clauses, and thus reducing their arities). For j ∈ [w i ], the formula C i,j is obtained from C i by further removing clause c i and pinning all the occurrences of x i,1 , . . . , x i,j−1 to 0.
Before proceeding, let us argue that the pairs (C i,j , x i,j ) satisfy (4) for all i ∈ [d] and j ∈ [w i ]. For such i, j, we first prove that C i,j is a (satisfiable) monotone CNF formula. That is, we prove that the various pinnings in the construction of C i,j from C do not pin all variables of some clause of C to 0. For the sake of contradiction, assume otherwise. Observe that C i,j is obtained from C by either removing some clauses or by pinning some occurrences of the variables to 0. Clearly, removal of clauses does not affect satisfiability, so we may focus on the effect of pinning. For i ∈ [d] and j ∈ [w i ], the only variables whose (some of the) occurrences in C get pinned to 0 are x, x i,1 , . . . , x i,j−1 . Since we assumed (for contradiction) that C i,j is unsatisfiable, it must be the case that there exists a clause c ′ in C all of whose variables are (a subset of) x, x i,1 , . . . , x i,j−1 . It follows that c i is redundant in C since it is a strict superset of clause c ′ . This gives a contradiction, since the pre-processing operation ensures that C has no redundant clauses. Thus, C i,j is satisfiable as wanted. Next, we show that x i,j is not forced in C i,j . First, observe that x i,j is not forced in C since the second part of the preprocessing step ensures that C does not contain forced variables. Thus, the only way that x i,j can be forced in C i,j is if there existed a clause c ′ in C whose variables were x i,j together with a subset of x, x i,1 , . . . , x i,j−1 . Since C includes c i and C does not have redundant clauses, it must be the case that c ′ = c i . It remains to observe that C i,j does not include (any subclause of) c i , from which it follows that x i,j is not forced in C i,j .
We are now ready to state the relation between R(C, x) and the quantities R(C i,j , x i,j ) with i ∈ [d] and j ∈ [w i ]. Lemma 10 ([12,Lemma 5]). It holds that Proof. The proof is identical to the proof of [12,Lemma 5] (which in turn builds on the technique of [20]), we give the proof for completeness. Recall that C is the formula after the preprocessing step and that R( C, x) = R(C, x). We may assume that x is not free in C (otherwise, it holds that R( C, x) = 1, which coincides with the evaluation of the right hand side of (5) under the standard convention that the empty product evaluates to 1).
Equation (5) follows immediately from the following two equalities. .
The first equality in (6) is a consequence of a telescoping expansion of Pr C (x=1) . To see this, let C ′ be the formula obtained from C by replacing, for all i ∈ [d], the occurrence of the variable x in clause c i by a new variable x ′ i . We have that which yields the first equality in (6) after substituting R(C i , x) = Pr C i (x=1) . For the second equality in (6), observe that x appears only in clause c i of the formula C i , and thus (denoting by C i \c i the formula which is obtained from C i by deleting clause c i ) which proves the desired equality after substituting By applying (5) recursively, it is not hard to see that one can compute the quantity R(C, x) exactly. Of course, exact computation using this scheme will typically require exponential time, so as in [12] we will stop the recursion at some (small) depth L to keep the computations feasible within polynomial time. This will yield a quantity R(C, x, L) and the hope is that, by choosing L appropriately, the error |R(C, x, L) − R(C, x)| will be sufficiently small.
In light of (5), a natural way to define R(C, x, L) for integer L ≥ 0 is as follows 7 .
7 Note that the value 1 of R(C, x, L) when L ≤ 0 is somewhat arbitrary since L ≤ 0 corresponds to stopping the recursion. Our choice of the value 1 will be convenient for technical reasons that will become apparent in the proof of the upcoming Lemma 20. It is immediate that when the formula C has maximum degree bounded by a constant and, further, every clause has arity also bounded above by a constant, one can compute R(C, x, L) in time polynomial in n whenever L = O(log n). To account for formulas where the arities of the clauses can be arbitrarily large (but still where the degrees of variables are bounded by a constant), one needs to be more careful with clauses of large arity (i.e., when their arity as a function of n is ω(1), say log n). As in [12], we will account for this more general setting by pruning the recursion earlier whenever we encounter clauses with large arity.
Note that l 1 = . . . = l 5 = 1. For integer L, we set The particular choice of the logarithm base in Definition 11 is not very important as long as it is a big enough constant. The quantity R(C, x, L) is typically called a "message" (because it gets passed up the computation tree).

Remark 12.
For formulas C with a variable x which is not forced in C, we have the lower bound R(C, x) ≥ (1/2) dx(C) . The bound is simple to see using (5) and the fact that Similarly, for all integers L and all nodes (C, x) in the computation tree we have the bound R(C, x, L) ≥ (1/2) dx(C) .

Proof Outline
We want to guarantee that the error |R(C, x, L) − R(C, x)| is exponentially small in L. Notice that if we run the recursion long enough, it computes the true value; namely, R(C, x, ∞) = R(C, x). More precisely, we will prove the following two lemmas, which correspond to the settings of Theorem 2 and Theorem 3, respectively. Recall that C k,∆ is the set of all monotone CNF formulas which have maximum degree at most ∆ and whose clauses have arity at least k. Our proof will use the following constant.

Lemma 14.
There exists a constant τ > 0 such that for every C ∈ C 3,6 , every variable x ∈ C, and every integer L, Lemma 15. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. There exists a constant τ > 0 such that for every C ∈ C k,∆ , every variable x in C, and every integer L, Our proof uses correlation decay techniques together with a new method which accounts for the shape of the computation tree. Lemma 15 is technically simpler and is proved in Section 4 to better illustrate the idea. Lemma 14 is proved in Section 5. In the rest of this section, we give an overview of our overall proof strategy.
To analyze the error of the recursion, the standard approach so far in the literature has been to show that, for a node (C, x) in the computation tree, the quantity |R(C, x, L) − R(C, x, ∞)| is bounded by α max i,j |R(C i,j , x i,j , L − 1) − R(C i,j , x i,j , ∞)| for some constant 0 < α < 1. This allows one to inductively deduce that |R(C, x, L) − R(C, x, ∞)| decays exponentially in L. This approach has been extremely successful when strong spatial mixing holds [16,11,12,15,21,13].
In fact, this step-wise decay seldom holds if we track R(C, x, L) directly. Instead, the analysis is usually done by tracking Φ(R(C, x, L)) for an appropriate potential function Φ. In particular, let Φ : (0, 1] → R be a potential function that satisfies: Φ is continuously differentiable on (0, 1], and ϕ := Φ ′ satisfies ϕ(z) > 0 for z ∈ (0, 1]. (8) The usual approach is to show that |Φ(R(C, x, L)) − Φ(R(C, x, ∞))| decays exponentially in L, which is sufficient to imply lemmas like Lemma 14 and Lemma 15. In our setting, this inductive approach is problematic since, inside the computation tree, we are faced with the possibility that the formula at the root of a subtree could have many arity-2 clauses. For ∆ ≥ 6, these subtrees prohibit the application of the above proof scheme since they are in non-uniqueness and hence the desired step-by-step decay is no longer present, regardless of the choice of the potential function.
While arity-2 clauses are problematic, clauses with larger arities do at least lead to good decay of correlation in a single step. In general, as the arity gets larger, the decay gets better. Thus, our approach is to do an amortised analysis. In a single step, we track both the one-step decay of correlation and the number of variables in the current formula that are pinned to 0. These 0 pinnings will decrease the effective arity of clauses, and will later lead to worse decay.
More formally, we will track a specific quantity m(C, x, L) which is assigned to each node in the computation tree. Let C * be the original monotone CNF formula and let (C, x) be a node in the computation tree. As explained in Section 2.3, each clause c ′ of C is obtained from a clause c of C * by pinning a certain number of variables to 0 (possibly none), which effectively is the same as removing those variables from the clause. We call these 0-pinnings deficits and let max{0, k − |c ′ |} be the number of deficits of c ′ . Note that a clause of arity larger than k is considered to have no deficit, although some variables of it may have been pinned to 0. Definition 16. Let D(C) = c ′ ∈C max{0, k − |c ′ |} denote the sum of the deficits of the clauses in C.
Observe that if a clause c of C * does not show up in C, it does not contribute any deficits. For any node (C, x) in the computation tree, let where δ ∈ (0, 1) is a constant that we will choose later, and the potential function Φ will be specified shortly in Definition 17. Crucially, the root formula C * satisfies since at the root no variable is pinned yet and D(C * ) = 0. Thus, the key step is to show that the quantity |m(C, x, L) − m(C, x, ∞)| decays exponentially with L; we will show that, for an arbitrary node (C, x) in the computation tree, it holds that where α = 1 − 10 −4 is from Definition 13. In previous applications of the correlation decay technique, the ordering of the children of each node (C, x) is usually arbitrary. Since we want to take the shape of the computation tree into consideration, this ordering becomes important to us. We will order clauses in the order of increasing size, except that we leave arity 2 clauses to the end.
Unfortunately, the quantity m(C, x, L) is more complicated than the plain message R(C, x, L) and it is even more complicated than Φ(R(C, x, L)), since it incorporates combinatorial information about the formula C and thus it does not satisfy a simple recursion (unlike R(C, x, L)). Nevertheless, we are able to define a multi-variable quantity κ * (see (12)) and we will show (see Lemma 20) that when κ * ≤ 1, inequality (9) holds.
We will use the following potential function, which satisfies (8) as required. In general, the choice of an appropriate potential function is guided by an "educated guess". Definition 17. Let χ = 1/2 and ψ = 13/10. Define .

Remark 18.
The exact values of χ and ψ do not matter at this stage, but it is important that 0 < χ ≤ 1 and ψ > 1. For such values of χ and ψ, Φ −1 exists and is uniquely defined over the range of Φ(z) for z ∈ (0, 1].

A general framework to bound the error
First let us calculate how the number of deficits changes in one step of the recursion. To avoid trivialities, we assume k ≥ 2. Let (C, x) be a node in the computation tree. As in Section 2.3, we first perform a pre-processing step on C which removes redundant clauses and removes clauses of arity 1, producing a new formula C. Every clause in C is a clause of C, so D( C) ≤ D(C). Also, x is neither forced nor free in C (otherwise, (C, x) is a leaf in the computation tree). Let d = d x ( C) and let c 1 , . . . , c d be the clauses where x occurs in C.
Recall that w i = |c i | − 1. As we mentioned in Section 3, the order of c 1 , . . . , c d is important. We will order clauses in order of increasing size, except that we leave arity-2 clauses to the end. Here is some notation to describe the ordering. Let b ℓ denote the number of clauses amongst c 1 , . . . , c d with arity ℓ. We will use the variables b ′ ℓ to denote cumulative sums for ℓ > 2, so b ′ 2 = 0 and, for ℓ We order the clauses so that, for ℓ ≥ 3, clauses . . , c d have arity 2. Let s i be the sum of the deficits of clauses c 1 , . . . , c i . Thus, We will now consider how the deficits change when we construct the node (C i,j , x i,j ) from ( C, x) according to the method described in Section 2.3, where j ∈ [w i ].
• The arity-2 clauses in C are always removed in the construction of C i,j , resulting in a loss of deficit of b 2 (k − 2).
• The clauses c 1 , . . . , c i are also removed in the construction of C i,j resulting in an additional loss of deficit of s min(i,d−b 2 ) . (The minimum is to avoid double-counting if i > d − b 2 since in that cases some of these clauses have arity 2, and have already been counted.) • The occurrences of x are pinned to 0 in clauses c i+1 , . . . , c d . Consider some t ∈ {i + 1, . . . , d}. If the arity of c t is greater than k then this pinning does not cause any increase in deficit. Also, if the arity of c t is 2, then the clause will be removed, so there is no increase in deficit. Thus, the increase in deficit from these pinnings is at most Then, putting these observations together, we conclude that Recall that the recursion for R(C, x, L) depends on the function F d,w (r) implicitly defined by (7), i.e., where The m(C, x, L) variables also satisfy a recursion which could be made explicit by mapping them back to the R(C, x, L) variables, though we will not directly analyse this recursion. Instead, we will define a quantity κ d,w * (r) which tracks the rate at which |m(C, x, L)−m(C, x, ∞)| decays in the recursion. Specifically, define κ d,w * (r) as follows.
The main step in the proofs of Lemma 14 and Lemma 15 will be to bound κ d,w * (r). By construction, the elements in w are in increasing order, apart from the 1s at the end, and the bound on κ d,w * (r) will use this fact, so we give the following definition.
Definition 19. Let w 0 = 2. A vector w = w 1 , . . . , w d is suitable if its entries are positive integers and there is a t ∈ {0, . . . , d} such for all j in {1, . . . , t}, w j ≥ w j−1 and for all j in {t + 1, . . . , d}, w j = 1. Given a suitable vector w we use the following global notation (which depends implicitly on w): b ℓ is the number of entries of w which are equal to ℓ − 1.
Lemma 20. Suppose that ∆ and k are integers with ∆ ≥ 2 and k ≥ 3. Suppose that there are constants 0 < δ < 1 and U > 0 such that, for all 1 ≤ d ≤ ∆, all suitable w = w 1 , . . . , w d , and all r satisfying Then there exists a constant τ > 0 such that, for every C ∈ C k,∆ , every variable x ∈ C, and every integer L, it holds that Proof. We will show that there is a constantτ > 0 such that for all such C, x and L, it holds that Assuming (15) for the moment, let us conclude (14). Consider C ∈ C k,∆ . We may assume that L > 0 since for L = 0 the inequality (14) holds for all sufficiently large τ (any τ ≥ 1 works). Consider any x ∈ C and consider the computation tree rooted at (C, x). By the definition of m(·, ·, ·) and since by assumption D(C) = 0, we have that Let η = (1/2) ∆−1 and let 1] ϕ(x).
Since ϕ is continuous and ϕ(x) > 0 for all x ∈ [η, 1], we have that K min Φ and K max Φ are positive. We have that To see (17), we may assume that R(C, x, L) = R(C, x, ∞) (otherwise the inequality holds at equality), in which case the inequality follows by an immediate application of the Mean Value Theorem to the function Φ. Combining (16), (17) with (15) yields (14) with τ =τ /K min Φ , as desired.
To prove (15), we will first show a slightly weaker claim. Namely, for all nodes (C, x) in the computation tree where C ∈ C k,∆ and x ∈ C is a variable with degree ≤ ∆ − 1, for all integer L, it holds that For L ≤ 0, we have that where in the first inequality we used that δ ∈ (0, 1], D(C) ≥ 0 and an application of the Mean Value theorem analogous to the one used in (17), while in the second inequality we used that for L ≤ 0 it holds that R(C, x, L) = 1 (by definition) and 0 ≤ R(C, x, ∞) ≤ 1.
To prove (18) for integer L > 0 we proceed by induction on L. Namely, we assume that L > 0 and that (18) holds for all smaller values than L (the base cases L ≤ 0 have already been shown).
Recall from (4) that x is not forced in C and that C is the formula obtained by removing the redundant clauses in C. We may assume that x is not free in C -otherwise, observe that R(C, x, L) = R(C, x, ∞) = 1 and thus m(C, x, L) = m(C, x, ∞), so that (18) holds. We will thus focus on x which appear only in (a non-zero number of) clauses in C of arity ≥ 2.
Denote by r (1) the vector whose coordinates are given by Denote also by r (2) the vector whose coordinates are given by Note also that η1 ≤ r (1) , r (2) ≤ 1 (cf. property (4) for the nodes of the computation tree, (7), footnote 7 and Remark 12).
i,j and let r i,j (θ) := Φ −1 (z i,j (θ)) (note that the inverse Φ −1 exists and is uniquely defined in the interval Φ([η, 1]) cf. Remark 18 and (8)). Denote by r(θ) the vector whose coordinates are r i,j (θ) and note that η1 ≤ r(θ) ≤ 1. Finally, let h(θ) := Φ(F (r(θ))). Observe that h is differentiable for all values of θ ∈ [0, 1]. By applying the Mean Value theorem to the function h(θ), we obtain that there exists θ 0 ∈ (0, 1) such that We have that It follows that where the inequality follows by the triangle inequality and the fact (see Definition 17) that ϕ = Φ ′ satisfies (8). Now note that for all i ∈ [d] and j ∈ [w i ], we have by induction that From (20) and the bounds (21), we obtain where in the inequality we used that δ ∈ (0, 1]. Combining (22), (23) and the assumption (13) Finally, we prove (15) withτ := K max Φ · max{U, 1}, where U is the constant in assumption (13). For x of degree ≤ ∆ − 1, we get (15) immediately from (18). Now suppose that (C, x) are such that x has degree ∆ in C. Note, for a node (C ′ , x ′ ) in the computation tree, x ′ may have degree d = ∆ in C ′ only if (C ′ , x ′ ) is the root of the tree. It follows that the children of the node (C, x), say (C i,j , x i,j ) with i ∈ [d] and j ∈ [w i ], are such that x i,j has degree at most ∆ − 1 in C i,j . Hence, by applying (18), we obtain that (21) holds for all i, j and hence (as before) we deduce that (22), (23) hold as well. Inequality (15) now follows since by assumption (13) we have that κ d,w * (r) ≤ max{U, 1} for d ≤ ∆. This concludes the proof of Lemma 20.
We now state two technical lemmas which will be proved later in the paper. These lemmas verify the premise of Lemma 20 in the settings of Theorems 2 and 3. Note from Equation (12) that κ d,w * (r) depends on the global quantity k and on various quantities (depending on w) which are defined in Definition 19.
Lemma 21. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. There are constants 0 < δ < 1 and U > 0 such that, for all 1 ≤ d ≤ ∆, all suitable w = w 1 , . . . , w d , and all r satisfying 0 < r ≤ 1, it holds that Lemma 22. Let ∆ = 6 and k = 3. There are constants 0 < δ < 1 and U > 0 such that, for Lemma 21 will be proved in Section 4 and Lemma 22 will be proved in Section 5. Using these lemmas and Lemma 20, we can prove Lemma 14 and Lemma 15.

Lemma 14.
There exists a constant τ > 0 such that for every C ∈ C 3,6 , every variable x ∈ C, and every integer L, Proof. The lemma follows immediately from Lemma 22 and Lemma 20.
Lemma 15. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. There exists a constant τ > 0 such that for every C ∈ C k,∆ , every variable x in C, and every integer L, Proof. The lemma follows immediately from Lemma 21 and Lemma 20.

Proof of the main theorems
In this section, we give the proofs of Theorems 2 and 3, which we restate here for convenience.
Theorem 3. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. Then there is an FPTAS for the problem #HyperIndSet(k, ∆).
We have now finished the proofs of Theorems 2 and 3 except that we have not yet proved the Lemmas 21 and 22 which we used to bound κ d,w * (r) in the proofs of Lemmas 14 and 15. Lemma 21 will be proved in Section 4 and Lemma 22 will be proved in Section 5. The proofs bound the multivariate decay rate function κ d,w * (r). This is an optimisation problem that is quite complicated to solve. Moreover, we need to solve it for all possible suitable vectors w. The analysis of bounding κ d,w * (r) is the technical core of our proof.

Application to counting dominating sets
In this section, we use Theorems 2 and 3 to obtain Corollary 4, which we restate here for convenience.
The corollary follows from the observation that a dominating set in a ∆-regular graph is defined by a collection of constraints of arity ∆ + 1 with each variable occurring in ∆ + 1 constraints. The details are spelled out in the following proof.
Proof of Corollary 4. Let ∆ be an integer satisfying either 2 ≤ ∆ ≤ 5 or ∆ ≥ 199. Let G be a ∆-regular graph and denote by #DomSets(G) the number of dominating sets in G.
We conclude the proof by showing the construction of H. For a vertex v ∈ V of G, denote by v 1 , . . . , v ∆ its neighbours in G (note that there are exactly ∆ of those since G is ∆-regular) and let e v = {v, v 1 , . . . , v ∆ }. Then H is the hypergraph with vertex set V and hyperedge set This completes the construction of H.
It remains to show that the number of dominating sets in the graph G is equal to the number of independent sets in the hypergraph H. It suffices to show that S ⊆ V is an independent set of H iff V \S is a dominating set of G. Indeed, if S is an independent set of H, then for every vertex v ∈ S, at least one of v 1 , . . . , v ∆ is not in S (since e v is a hyperedge of H) and hence V \S is a dominating set of G. Similarly, if V \S is a dominating set of G, then for every vertex v ∈ S, at least one of v 1 , . . . , v ∆ is in V \S and hence S is an independent set of H (since each hyperedge e v of H contains at least one vertex which does not belong to S).
This completes the proof.

Bounding the decay rate for large ∆
This section is devoted to proving Lemma 21. Let k and ∆ be two integers such that k ≥ ∆ and ∆ ≥ 200. We start by setting up some upper bounds on the function κ d,w * (r) which is defined in (12) using notation from Definition 19. Consider the following definitions, which apply to suitable vectors w.
We first argue that κ d,w Recall from Definition 13 that α = 1 − 10 −4 . For brevity, we will denote F d,w (r) by F (r). Finally, we define the constant δ (which depends on ∆).
In order to bound κ d,w (r), an important special case is when b 2 = 0. Indeed, as we will see soon, handling this special case implies an upper bound of κ d,w (r) for the general case. Define the following quantity for suitable vectors w.
The next lemma gives an upper bound on the quantity κ d,w (r).
In the case d = ∆ for w = w 1 , . . . , w d , a suitable vector with b 2 = 0 and all r satisfying Note that in the statement of Lemma 24, the w i 's are positive integers in non-decreasing order, all of which are at least 2. Lemma 24 will be proved in the remainder of Section 4. First we use it to prove Lemma 21, which we restate for convenience.
We want to move the term in front of the parentheses in (27) where we use the definition of κ d,w from (26) and Lemma 24 (using the fact that d ≤ ∆ − 1) in the last step. In Section 8.1.1, we verify using Mathematica's Resolve function that, for any 0 ≤ r ≤ 1, it holds that Therefore (28) simplifies into for any integer b 2 ≥ 0. This finishes the proof. The bound in the case d = ∆ follows the same argument; the only difference is that in equation (28) we use the d = ∆ case of Lemma 24 and hence obtain a weaker (constant) bound on κ d,w (r).

Useful lemmas for the proof of Lemma 24
We have now finished the proof of Theorem 3 apart from the proof of Lemma 24, and the remainder of Section 4 is devoted to the proof of Lemma 24. First, in this section, we prove some useful lemmas. To make them easier to read, we list some useful constants and functions in Tables 1 and 2. The first two lemmas are merely technical.   Lemma 25. Let f (r) := ψ−r χ 1+r for r ∈ (0, 1] and parameterize t in terms of r via e t = r 1+r . Then, f , viewed as a function of t, is concave. Proof. Note that since r ranges from 0 to 1, we have that t ranges from −∞ to ln(1/2). Further, we have that r = e t 1−e t , so we want to show that the function is concave for all t ≤ ln(1/2). We do this by verifying thatf ′′ (t) < 0 for t ≤ ln(1/2) using Mathematica's Resolve function, see Section 8.1.2 for details.
We now turn to the problem of upper bounding κ d,w (r) which is what we need to do to prove Lemma 24. Recall from Definition 19 that s i = i t=1 max(0, k − w t − 1). Thus, for i ≥ 1 and d ≤ ∆ − 1, we have Hence, from the definition of κ d,w (r), we have Recall from (11) Then we can express the right-hand-side of (30) as a function of the t i,j 's as follows.
wheref is defined in Lemma 25. By Lemma 25,f is concave, so Jensen's inequality applies, showing that the quantity is at most So we can replace each t i,j with u i /w i without decreasing the right-hand-side. Equivalently, we can replace each r i,j with a quantity r i so that w i j=1 r i,j . Also notice that δ k ≤ δ ∆ = c = 0.7. From (30), we thus obtain that : where and the range of . The next lemma gives an upper bound on g(y, w).
Proof. The equality in (33) is immediate by just substituting y = (1 − 2 −w ) 1/2 into g(y, w), so we only need to argue for the inequality in (33). We make the change of variable t = (1 − y 2 ) 1/w so that the range of t is (0, 1/2]. The inequality can then be written aŝ We verify (34) for w = 2, 3, 4, 5 and all t ∈ (0, 1/2] using Mathematica's Resolve function, see Section 8.1.3 for details. To obtain the lemma for w ≥ 6, it then suffices to show that This can be massaged into so it suffices to show that g is increasing for t ∈ (0, 1/2]. We calculate which is positive for all t ∈ (0, 1) since the function f (t) := wt w+1 − (w + 1)t w + 1 satisfies For every positive integer w, define ξ(w, 0) = 0. For every positive integer d, define In order to upper bound κ d,w (r) it is going to be useful to know when ξ(w, d) is decreasing in w. This is captured by the following lemma.
Proof. Recall from Definition 11 that l w = ⌈log 6 (w + 1)⌉. Thus, l w − l w+1 ∈ {0, −1}. Using this fact and the fact that 0 < α < 1, we obtain that where Note that when w ≥ k − 1, D w,d is trivially equal to 1, so for this range of w the right hand side of (36) is upper bounded by 1 (since w ≥ k − 1 ≥ 199 and, from α = 1 − 10 −4 and c = 0.7, we have (2αc) −1 < 3/4). Henceforth, we will thus focus on the range 2 ≤ w ≤ k − 3. To bound the right-hand side of (36), we will use the following bound on D w,d : We will justify the bound in (37) shortly, for now note that the desired inequality ξ(w + 1, d)/ξ(w, d) ≤ 1 will follow from We verify (38) for w = 2, 3, 4, 5 and also show that for w = 6 we have M w (2ac) −1 (1+w −1 ) < 1 (for w ≥ 7 the result follows by monotonicity), see Section 8.1.4 for the code. We finish the proof by justifying the bound in (37). Note that for all 2 ≤ w ≤ k − 3, we have that To prove (37) for 2 ≤ w ≤ 5, we use the fact that k ≥ ∆ ≥ 200, the definition δ ∆ = c and the fact that the function x/(1 − x) is increasing for x ∈ (0, 1) to obtain see Section 8.1.4 for the calculation in the last inequality. To prove (37) for 6 ≤ w ≤ k − 3, we will show that To justify (40), note that the equality is immediate using that δ ∆ = c. Also, the very last (strict) inequality in (40) is immediate using 0 < δ < 1. In the following, we may thus focus on proving the first two inequalities in (40). To justify the first inequality in (40), i.e., D w,d ≤ D w,∆ for 1 ≤ d ≤ ∆, just use Lemma 26 for the function f 1 with x = δ k−w−2 and y = δ k−w−1 . For the second inequality in (40), i.e., D w,∆ ≤ D k−3,∆ for 6 ≤ w ≤ k − 3, note that, by δ ∆ = c, Then D w,∆ ≤ D k−3,∆ when w ≤ k − 3 (i.e., w ′ ≥ 1) follows from Lemma 26 for the function f 2 with x = c and y = δ. This concludes the proof of the bound in (37), which completes the proof of Lemma 28.
In order to upper bound κ d,w (r) it is going to be useful to have an upper bound on ζ(w, d) and this is done in the following lemma.
Proof. We wish to find an upper bound for ζ(w, d). The reason that the task is difficult is that the vector w may have up to d different entries. We say that a vector w ′ = {w ′ 1 , . . . , w ′ d } of integers dominates w if the following are true.
So a good way to find an upper bound for ζ(w, d) is to find a "simple" vector w ′ which dominates w and then find an upper bound for ζ(w ′ , d). To do this, we define several classes of vectors w, depending on how "desirable" they are for proving upper bounds.
• A vector w is "fairly good" if it is partly good and w 1 = · · · = w d or w d ≤ k − 1.
• A vector w is "very good" if it is fairly good and every w i is in {w 1 , k − 1}.
The vector w that we start with (in the statement of the lemma) is partly good, but it will be easiest to prove upper bounds on ζ(w, d) for very good vectors w. Thus, we will define two transformations.
1. The first transformation starts with a partly good vector w. If w is fairly good, then the transformation does nothing. Otherwise, it produces a partly good vector w ′ = w which dominates w.
2. The second transformation starts with a fairly good vector w. If w is very good, then the transformation does nothing. Otherwise, it produces a fairly good vector w ′ = w which dominates w.
Both transformations make progress in the sense that there exists an i such that w ′ i < w i . If we start with any partly good vector w and repeatedly apply the first transformation then, after a finite number of transformations, we must obtain a fairly good vector w ′ which dominates w. (The reason that a finite number of transformations suffices is that each individual transformation makes progress, but the entries stay sorted, and the first coordinate never changes.) Next, we apply the second transformation repeatedly, starting from w ′ . Again, after a finite number of transformations, we end up with a very good vector w ′′ which dominates w ′ and therefore dominates w. An upper bound on ζ(w ′′ , d) gives an upper bound on ζ(w, d). So to finish the proof, we must show that the two transformations are possible. Then we must show that for every very good vector w with w 1 ≥ 2, ζ(w, d) ≤ τ 2 and we must show that for every very good vector w with w 1 ≥ 6, ζ(w, d) ≤ τ 6 .
Transformation 1: Start with a partly good vector w. If w is fairly good, do nothing. Otherwise, w d > max(w 1 , k − 1). Choose the integer t to be as small as possible, subject to the constraint that, for all i in t + 1, . . . , d, we have w i = w d . Note that 1 ≤ t < d and w t < w d . Recall from Definition 19 that s i = i j=1 max(0, k − w j − 1). Since w t+1 = · · · = w d we have for any j ∈ {0, . . . , d − t} that s t+j = s t + max{0, k − w d − 1}j. This means that Thus, the transformation sets w ′ t+1 = · · · = w ′ d = max(w t , k − 1) and, w ′ 1 , . . . , w ′ t = w 1 , . . . , w t . Transformation 2: Start with a fairly good vector w. If w is very good, do nothing. Otherwise, w 1 < w d ≤ k − 1. Choose the integer t to be as small as possible, subject to the constraint that, for all i in t + 1, . . . , d, we have w i = k − 1. Clearly, t ≤ d. We defined ξ(w, 0) to be 0, so from (42), we have Now, choose the integer t ′ to be small as possible, subject to the constraint that, for all i ∈ {t ′ + 1, . . . , t}, w i = w t . Since w is not very good, 1 ≤ t ′ < t and w t ′ < w t < k − 1. Since w t ′ +1 = · · · = w t , we can decompose the right-hand side of (43) as To finish the proof, we must show that for every very good vector w with w 1 ≥ 2, ζ(w, d) ≤ τ 2 and we must show that for every very good vector w with w 1 ≥ 6, ζ(w, d) ≤ τ 6 . Now if w is a very good vector, then choose t as small as possible so that w t+1 = · · · = w d = k − 1. Note that 0 ≤ t ≤ d and w 1 = · · · = w t and w t+1 = · · · = w d = k − 1. Thus, To bound the terms in (44), note that ξ(k − 1, 0) = 0 and for any 1 where the last inequality follows from (2αc) −1 < 3/4 and k ≥ 200.
For t = 0, we have ξ(w 1 , t) = 0 and ξ(k − 1, d − t) < 10 −10 by (45). Thus assume t > 0 so that w 1 < k − 1. Then, by Lemma 28, for k − 1 > w 1 ≥ 2, we have ξ(w 1 , t) ≤ ξ(2, t) and The numerical calculation in Section 8.1.5 shows that this is at most 4.5931. Since this plus 10 −10 is less than τ 2 , we obtain the first part of the lemma. Similarly, see Section 8.1.5 for the calculation in the last inequality. Since 2.78045 + 10 −10 < τ 6 , we obtain the second part of the lemma, and we have finished the proof of the lemma.

A quick proof of a weaker Theorem
Our goal is to prove Lemma 24, but the proof, which will be given in Section 4.3, is a little bit technical. In order to give the intuition, without getting into technical details, we first state and prove a weaker version of the lemma. Lemma 30, below, is identical to Lemma 24, except that the condition k ≥ ∆ has been strengthened to k ≥ 2.64∆. Using Lemma 30 in place of Lemma 24 strengthens the condition to k ≥ 2.64∆ in Lemmas 21 and 15. Thus, Lemma 30 gives immediately a weaker version of Theorem 3 where the condition k ≥ ∆ is replaced with k ≥ 2.64∆.
Lemma 30. Let k and ∆ be two integers such that k ≥ 2.64∆ and ∆ ≥ 200. Let d be a positive integer such that d ≤ ∆ − 1. Let w = w 1 , . . . , w d be a suitable vector with b 2 = 0.
In the case d = ∆ for w = w 1 , . . . , w d , a suitable vector with b 2 = 0 and all r satisfying where where in the last inequality we used the fact that ζ(w, d) ≤ τ 2 from the first part of Lemma 29 and the fact that δ k ≤ δ 2.64∆ = c 2.64 . It is a matter of a simple numerical calculation to check that the right hand side of (48) is less than 1, see Section 8.1.6 for details. Thus, we have shown that κ d,w (r) < 1.
The case d = ∆ has the same proof, the only difference is that in equations (30) and (31) we replace δ k by δ k−1 , losing a factor of 1/δ (the upper bound is valid using the same argument as before, now using inequality

The Proof of Lemma 24
Recall that our actual goal is to prove Lemma 24, which is stronger than Lemma 30 because it only assumes k ≥ ∆, not k ≥ 2.64∆. Recall that where 1) and the function g(y, w) is given by (32). Using the fact that k ≥ ∆ and δ ∆ = c we obtain the inequality Lemma 27 gives an upper bound on g(y, w) in terms of the constants K w . Since K w = 1 for all w ≥ 5, we want to split the sum in (49) for w ≤ 5 and w ≥ 6. More generally, we split the summation in the bound (49) at an index t ≤ d using Lemma 27 as follows. where . . , w d }, and the function ζ(w, d) is defined in (41).
Intuitively, the term ζ(w ′ , d − t) bounds a tail sum coming from the last d − t clauses corresponding to the variables w t+1 , . . . , w d . Recall from the statement of Lemma 24 and the sentences following its statement that the w j 's are in increasing order. Preferably, we want to choose the index t to split the sum in (50) so that w t ≤ 5 and w t+1 ≥ 6. However, we also do not want too many terms to be in the first sum (since each of these will cause us work), so we insist that t ≤ 8. When t = 8, we will use the bound w t+1 ≥ 2. If t < 8 we will be able to use the stronger bound w t+1 ≥ 6.
Combining (56), (57) and Lemma 32 we get an upper bound on κ d,w (r) provided w is such that t 6 ≤ 7. If t 6 ≥ 8, we will set t = 8 in (50). Similarly to deriving (56), we get that where each y i ∈ [Y 0 , 1] and K 2 and τ 2 can be found in Table 1. Similarly to (57), define σ 8,2 (y) to be the right-hand side of (58). Namely, The next lemma bounds σ 8,2 (y) and is proved in Section 4.4.
We can now prove Lemma 24, which we restate here for convenience.
The case d = ∆ has the same proof. Like the proof of Lemma 30, the only difference in the d = ∆ case is that in equations (30) and (31) we replace δ k by δ k−1 , losing a factor of 1/δ (the upper bound is valid using the same argument as before, now using inequality

Remaining Proofs
In this section we provide technical details of Lemma 32 and Lemma 33, which we restate for convenience.

Proof. Recall that
where c 5 = c 1−6/200 , c = 0.7 and τ 6 = 2.7805 are as in Table 1 and the function h(y) is given by (55). For t = 0, we have see Section 8.1.8 for the calculation. For t = 1, we have see Section 8.1.8 for the verification using Mathematica's Resolve function. Thus, we may assume that t ≥ 2 henceforth. For the sake of contradiction, suppose that there exists y ′ such that σ t,6 (y ′ ) > 1 for some 2 ≤ t ≤ 7. We will gradually adjust the variables y ′ i without decreasing σ t,6 (y ′ ) until there is only one variable left, in which case we will be able to exclude the possibility that σ t,6 (y ′ ) > 1.
We first observe that the partial derivative of σ t,6 (y) with respect to y i is ∂σ t,6 (y) if Y 0 ≤ y i ≤ Y 1 , and ∂σ t,6 (y) if Y 1 ≤ y i ≤ 1.
Suppose that there exists an index 3 ≤ i ≤ t such that y ′ i ≤ Y 1 . Using our initial assumption that σ t,6 (y ′ ) > 1, we then have (from (62) and y ′ j ≥ Y 0 ) that ∂σ t,6 (y) ∂y i for any 2 ≤ t ≤ 7 and 3 ≤ i ≤ t, see Section 8.1.8 for the verification of the last inequality. Hence σ t,6 (y) is increasing in this y ′ i and we may thus assume y Then, using that y ′ i ≥ Y 1 for all 3 ≤ i ≤ t and y ′ 1 ≥ Y 0 , together with our assumption that σ t,6 (y ′ ) > 1, we have (from (62)) that ∂σ t,6 (y) for any 2 ≤ t ≤ 7, see Section 8.1.8 for the verification of the last inequality. Arguing as before, we may therefore assume y ′ i ≥ Y 1 for all 2 ≤ i ≤ t. Suppose that there exists an index 2 ≤ i ≤ min{5, t} such that y ′ i > Y 1 . Since y ′ i ≥ Y 1 for all 2 ≤ i ≤ t and y ′ 1 ≥ Y 0 , we obtain (from (57)) the following upper bound on σ t,6 (y ′ ) (using also the fact that h is decreasing): Plugging (66) into (63) and using the fact that the y j 's are at most 1, we obtain that ∂σ t,6 (y) ∂y i see Section 8.1.8 for the verification of the last inequality. Therefore σ t,6 (y) is decreasing in y ′ i for any 2 ≤ i ≤ min{5, t} and we may therefore assume that y ′ i = Y 1 for any 2 ≤ i ≤ min{5, t}. For 2 ≤ t ≤ 5, using the fact that y ′ 2 = . . . = y ′ t = Y 1 , we thus have that which is false for all y ′ 1 ∈ [Y 0 , 1], see Section 8.1.8 for the proof using Mathematica's Resolve function. Similarly, for t = 6, using that y ′ 2 = . . . = y ′ 5 = Y 1 , Y 1 ≤ y ′ 6 ≤ 1 and that h is decreasing, we have that which is false for all y ′ 1 ∈ [Y 0 , 1], see Section 8.1.8 for the proof using Mathematica's Resolve function. Finally, for t = 7, we have that y ′ 2 = y ′ 3 = y ′ 4 = y ′ 5 = Y 1 . Using this and Y 1 ≤ y ′ 6 , y ′ 7 ≤ 1, we obtain that which is false for all y ′ 1 ∈ [Y 0 , 1], see Section 8.1.8 for the proof using Mathematica's Resolve function.
Thus, for all 2 ≤ t ≤ 7, the assumption that there exists y ′ such that σ t,6 (y ′ ) > 1 has lead to a contradiction, thus completing the proof of Lemma 32 for all 0 ≤ t ≤ 7.
The proof of Lemma 33 is very similar.

Proof. Recall that
where c 5 = c 1−6/200 , c = 0.7, K 2 = 1.11614 and τ 2 = 2.7805 are as in Table 1 and the function h(y) is given by (55). For the sake of contradiction, suppose that there exists y ′ such that σ 8,2 (y ′ ) > 1. We will gradually adjust the variables y ′ i without decreasing σ 8,2 (y ′ ) until there is only one variable left, in which case we can directly verify that σ 8,2 (y ′ ) > 1 is impossible.
Identically to Lemma 32, the partial derivative of σ 8,2 (y) with respect to y i is if Y 0 ≤ y i ≤ Y 1 , and if Y 1 ≤ y i ≤ 1. We may thus use the same line of arguments as in Lemma 32 to conclude that we may assume that y ′ i ≥ Y 1 for 3 ≤ i ≤ 8 (by verifying (64) for t = 8 and 3 ≤ i ≤ 8), and then bootstrap that to y ′ i ≥ Y i for 2 ≤ i ≤ 8 (by verifying (65) for t = 8 and i = 2), see Section 8.1.9 for the details of the verification.
We thus obtain the following upper bound for σ 8,2 (y ′ ) (this is an analogue of (66) which is obtained using the fact that h is decreasing): Now suppose that there exists an index 2 ≤ i ≤ 5 such that y ′ i > Y 1 . We plug (71) into (63) and obtain the following analogue of (67): see Section 8.1.9 for the details of the verification of the last inequality. Thus, we may assume that y ′ i = Y 1 for 2 ≤ i ≤ 5. Then we can bootstrap our bound on σ 8,2 (y ′ ) in (71) to which gives that (74) where the last inequality holds for all y ′ 1 ∈ [Y 0 , 1], see Section 8.1.9 for the verification using Mathematica's Resolve function. This implies that we can set y ′ 6 = Y 1 as well. Using y ′ 2 = · · · = y ′ 5 = y ′ 6 = Y 1 , Y 1 ≤ y ′ 7 , y ′ 8 ≤ 1 and the fact that h is decreasing, we obtain that which is false for all y ′ 1 ∈ [Y 0 , 1], see Section 8.1.9 for the proof using Mathematica's Resolve function.
Thus, the assumption that there exists y ′ such that σ 8,2 (y ′ ) > 1 has led to a contradiction. This completes the proof of the lemma.

Bounding the decay rate for k = 3, ∆ = 6
In this section, we fix k = 3 and ∆ = 6. This section is devoted to proving Lemma 22, which we restate here for convenience.
Lemma 22. Let ∆ = 6 and k = 3. There are constants 0 < δ < 1 and U > 0 such that, for all 1 ≤ d ≤ ∆, all suitable w = w 1 , . . . , w d , and all r satisfying (1/2) ∆−1 1 ≤ r ≤ 1, it holds that In the statement of Lemma 22, d is an integer between 1 and 6. The vector w is a suitable vector, which is defined in Definition 19. This means that the entries w 1 , . . . , w d are in nondecreasing order, except that any "1" entries are left to the end. The definition of "suitable" also includes some global notation which depends implicitly on w: For all positive integers ℓ, b ℓ is the number of entries amongst w 1 , . . . , w d which are equal to ℓ − 1. Hence, ∞ ℓ=2 b ℓ = d. Also, w 1 , . . . , w b 3 are all equal to 2, whereas for i > b 3 , w i is either 1 or it is at least 3. We Recall the definition of κ d,w * (r) from (12), which we have specialised here to k = 3: Consider the following definitions, which apply to all suitable w.
We can now state a lemma which immediately implies Lemma 22 since κ d,w * (r) ≤ κ d,w (r).
Lemma 35. Let ∆ = 6 and k = 3. There is constant U > 0 such that, for all 1 ≤ d ≤ ∆, all suitable w = w 1 , . . . , w d , and all r satisfying (1/2) ∆−1 1 ≤ r ≤ 1, it holds that The rest of the section contains the proof of Lemma 35. This is a more involved optimisation problem than the one that arose in the proof of Lemma 21. Before delving into the details, we set up some convenient notation and then give a roadmap of the argument.

Outline of the proof
For convenience, let F := F d,w , κ(r) := κ d,w (r) which is defined in (77) and for i ∈ [d], let ρ i (r) := ρ w,i (r) which is defined in (76). Our goal is to show that, for all r such that 1 2 ∆−1 1 ≤ r ≤ 1, it holds that κ(r) ≤ 1 when d ≤ ∆ − 1 and that κ(r) is bounded by a constant when d = ∆.
Here is a rough outline of our analysis.
1. The first part of the proof will be to bound the quantities ρ i (r) appropriately for each i ∈ [d]. Namely, the main goal here will be to replace the {r i,j } j∈[w i ] by a suitable quantityr i . In fact, for this part of the proof, rather than working with the r i,j 's, it will be easier to work with t i,j = r i,j 1+r i,j See Lemma 36.
2. After the first part, we will have reduced significantly the dimensionality of the optimisation problem: from the initial number of i∈ [d] w i variables {r i,j } i∈[d],j∈[w i ] , we will be left with just d "representative" variables (one for each i).
3. Despite having reduced the number of variables quite a lot, the w i 's so far can be arbitrarily large integers. It will be convenient for us to restrict the range of the w i 's. Using a rather crude argument, we will be able to restrict our attention to i's such that 1 ≤ w i ≤ 5. Intuitively, the reason is that large w i 's make κ smaller. We quantify this effect in an appropriate way for our analysis. (See Lemma 37.) 4. The next step is a further reduction of the number of variables. In particular, recall from Item 2 that we have reduced to the case where the number of variables is d (one for each i) and, from Item 3, for each i ∈ [d] it holds that 1 ≤ w i ≤ 5. We will further group together these variables according to their values. That is, for integer 1 ≤ w ≤ 5 we will be able to use a single variable (indexed by w) to capture the aggregate contribution of the variables w i with w i = w. (See Equation (98)

The details of the argument
In this section, we expand in detail the outline of Section 5.1 and give the necessary technical ingredients needed to complete the proof of Lemma 35. Later subsections contain the left-over technical proofs which would significantly interrupt the flow. The first part of the proof will be to bound the quantities ρ i (r) appropriately. For i ∈ [d] and j ∈ [w i ], we have (by differentiating ln F (r) as in the proof of Lemma 21): .
, the quantity ρ i (r) can then be written as Let t i,j := r i,j 1+r i,j and lett i be the geometric mean of the t i,j 's, i.e., (t i ) w i := w i j=1 t i,j = w i j=1 r i,j 1+r i,j . As we shall see soon,t i will be used to capture the "aggregate" effect of w i . Let t be the vector whose entries are given by t i,j with i ∈ [d] and j ∈ [w i ]. Note that We will view the quantities κ(r) and ρ i (r) as a function of t. For that, it will be convenient to consider the function for t ∈ [1/(2 ∆−1 + 1), 1/2]. With this preprocessing, for i ∈ [d], the quantities ρ i (r), κ(r) as a function of t become where After this preliminary step, for i ∈ [d], we will now pursue the task of substituting the variables t i,j with j ∈ [w i ] with a single variablet i . Lett = {t i } i=1,...,d and note that 1 2 ∆−1 +1 1 ≤t ≤ (1/2)1. As a starting point, observe that Recall that ∆ = 6, δ = 9789/10000, χ = 1/2, ψ = 13/10. The following technical lemma, proved in Section 5.3, will be crucial in reducing the number of the variables t i,j .
Then for all positive integers w, the following inequality holds for all t 1 , . . . , t w ∈ [0, 1/2]: where t is the geometric mean of the t i 's, i.e., t = (t 1 · · · t w ) 1/w .

Applying Lemma 36 to the quantity ρ i (t) in (80) yields that for all i ∈ [d] it holds that
where F (t) is given by (81). The next part of the proof will be to bound the contribution of w i 's with w i ≥ 6 by small quantities so that we will eliminate those i with w i ≥ 6 (and hence the respective variablest i ) from consideration. This will be accomplished by the following lemma (proved in Section 5.4).
Lemma 37. Let M = 25/1000. Recall that F (t) is given by (81). Let i be such that w i ≥ 6. Then, for allt such that 0 ≤t ≤ (1/2)1, it holds that Recall from the beginning of Section 5 (based on Definition 19) that b ℓ is the number of entries amongst w 1 , . . . , w d which are equal to ℓ − 1. Using Lemma 37 we will now be able to eliminate those i such that w i ≥ 6. In order to do that easily, we will first re-order the entries in w. Let B = b 2 + b 3 + b 4 + b 5 + b 6 and note that 0 ≤ B ≤ d. Note that, in the context of (80), the ordering of the w i 's with w i = 2 does not matter as long as we maintain the invariant that their index i satisfies i ≥ b 3 + 1). Thus, from now on, without loss of generality, we will assume that w i ≥ 6 implies that i ≥ B + 1. That is, w i 's with w i ≥ 6 have indices larger than w i 's with w i ≤ 5.
To complete the program of eliminating those variablest i where i is such that w i ≥ 6, observe that zϕ(z) = 1/(ψ − z χ ), so Using (85), we thus obtain that where (87) The following quantity κ (3) (t) is similar to κ (2) (t): The only difference between κ (2) (t) and κ (3) (t) is that the term (d − B)M is not present in the numerator of the latter. We therefore have where the last inequality follows from b 2 ≥ 0, δ ∈ (0, 1] and the fact that thet i 's are positive.
The following lemma, proved later in this section, will allow us to conclude Lemma 35.
Lemma 38. Let ∆ = 6 and B be a non-negative integer less than or equal to ∆ − 1 = 5.
Lemma 35. Let ∆ = 6 and k = 3. There is constant U > 0 such that, for all 1 ≤ d ≤ ∆, all suitable w = w 1 , . . . , w d , and all r satisfying (1/2) ∆−1 1 ≤ r ≤ 1, it holds that Proof of Lemma 35. We first derive the bound κ(r) ≤ 1 when d ≤ ∆ − 1 = 5. Recall that the quantity κ(r) (as given in (77)) is equal to the quantityκ(t) (as given in (80)). Also, we have shown thatκ where κ (1) t is as in (84). We have also shown that where κ (2) t is as in (87). Moreover, we showed that where κ (3) (t) is as in (88), B is a non-negative integer less than or equal to ∆−1 = 5 satisfying B = b 2 + b 3 + b 4 + b 5 + b 6 and M = 25/1000 is as in Lemma 37. Lastly, by Lemma 38, we have where the constants ε B are as in (90). Combining all the above we obtain that It is a matter of numerical calculations to check that for all B = 0, 1, . . . , 5 and d = ∆ − 1 = 5, see Section 8.2.1 for the explicit calculations. This completes the proof of the lemma for d ≤ ∆ − 1 = 5. We next consider the case d = ∆, i.e., we show that there exists a constant U > 0 such that κ(r) ≤ U . This will follow by continuity arguments. More precisely, first note that inequalities (83), (86) and (89) still hold in the case where d = ∆, with the minor modification that in (89), the integer B (which, recall, is equal to b 2 + b 3 + b 4 + b 5 + b 6 ) can be as large as (but not bigger than) ∆. Let us fix B to be a non-negative integer which is at most ∆ = 6. Observe that there are finitely many possibilities for the non-negative integers For each such choice, the quantity κ (3) (t) is a continuous function of the (finitely many) variablest 1 , . . . ,t B and hence is bounded above by an absolute constant when 0 ≤t ≤ (1/2)1. It follows that for every non-negative integer B ≤ ∆ = 6, there exists an absolute constant U B > 0 such that κ (3) (t) ≤ U B . Thus, analogously to (91), we obtain the bound Note that U , as defined in (93), is a constant. The desired bound on κ(r) when d = ∆ follows. This concludes the proof of Lemma 35.
The remainder of this section will focus on the proof of Lemma 38. We begin our considerations by reducing the number of variables. We first need the following transformation. Namely, for w = 1, 2, . . . and all i ∈ [B] such that w i = w, we set y i := 1 − t w i (note that The quantity κ (3) (t) as a function of y = {y i } B i=1 and the functions g w then becomes Letŷ w be the geometric mean of those y i 's with w i = w (note that the number of such i's is equal to b w+1 ). More precisely, for b w+1 > 0, let and when b w+1 = 0, letŷ w = 1. Note that Let y = {ŷ i } i=1,...,B . Our goal will be to bound κ (4) (y) by a function of y.
Lemma 39. For w = 1, 2, . . . , 5, the function g w (e z ) is a concave function of z in the interval Proof. For z ∈ ln 1 − (1/2) w , 0 , let f (z) = g w (e z ). Our goal is to show that for w = 1, . . . , 5, it holds that For convenience, we use Mathematica's Resolve function, see Section 8.2.3 for details.
Lemma 39 and Jensen's inequality yield that To bound b 3 i=1 1 δ b 3 −i g 2 (y i ) by a function ofŷ 2 , we will use the following lemma (proved in Section 5.5).
Lemma 40. Let ∆ = 6 and b 3 be a non-negative integer less than or equal to ∆ − 1 = 5.
We use Lemma 41 to show the following.
Proof of Lemma 42. We may assume that b 4 ≥ 1 (when b 4 = 0, the bounds in the lemma follow immediately from Lemma 41). For b 5 = b 6 = 0, the quantity κ (5) ( y) simplifies into where we used that K (1) δ = 1 (note that the values of the variablesŷ 4 ,ŷ 5 do not affect the value of κ (4) when b 5 = b 6 = 0). The proof splits into two cases depending on whether b 3 is zero.
Case 1], we have the crude bound 0 ≤ A ≤ 1. By Lemma 41 (see also (100)), we have that where the values of the constants τ b 2 ,b 3 are as in Lemma 41 (cf. equation (99)). It follows that We next perform a transformation for the variableŷ 3 (similar to the one used in the proof of Lemma 41), namely, we set v 3 := (1 −ŷ 3 ) 1/3 so that v 3 ∈ [0, 1/2]. From the definition of the function g 3 (cf. equation (94)), we have that It follows that the quantity κ (6) (A,ŷ 3 ) as a function of A, v 3 can now be written as We will show that We use Mathematica's Resolve function, see Section 8.2.6 for details. Case II: b 3 = 0. For b 3 = b 5 = b 6 = 0, the quantity κ (4) ( y) simplifies into We next perform a transformation on the variablesŷ 1 ,ŷ 3 (similar to the one used in the proof of Lemma 41), namely, we set v 1 = 1 −ŷ 1 and v 3 : Using (101) and (102), we obtain the following expression for κ (5) in terms of v 1 , v 3 : This quantity is still too complicated for Mathematica to resolve efficiently, so we will need one more transformation. In particular, let u 1 , u 3 be positive reals defined by , and note that 0 ≤ u 1 , u 3 ≤ 1. The quantity κ (8) in terms of u 1 , u 3 becomes: We will show that We use Mathematica's Resolve function, see Section 8.2.6 for details. This completes the case analysis and therefore the proof of Lemma 42.
Proof of Lemma 43. We may assume that b 5 ≥ 1 (when b 5 = 0, the bounds in the lemma follow immediately from Lemma 42). For b 6 = 0 (note that the value of the variableŷ 5 does not affect the value of κ (5) ), the quantity κ (5) ( y) becomes 1], we have the crude bound 0 ≤ A ≤ 1. By Lemma 42, we have that where the values of the constants τ B ′ ,0 are given by equation (99). Using that δ b 2 ≤ 1, it follows that We next perform a transformation on the variableŷ 4 (similar to the one used in the proof of Lemma 41), namely, we set v 4 := (1 −ŷ 4 ) 1/4 so that v 4 ∈ [0, 1/2]. From the definition of the function g 4 (cf. equation (94)), we have that It follows that the quantity κ (6) (A,ŷ 4 ) as a function of A, v 4 can now be written as We will show that Lemma 44. Let B be a non-negative integer less than or equal to ∆ − 1 = 5. For all nonnegative integers b 2 , b 3 , b 4 , b 5 , b 6 such that b 2 +b 3 +b 4 +b 5 +b 6 = B, it holds that κ (5) ( y) ≤ τ B,0 where the constants τ B,0 are given by (99).
Proof of Lemma 44. We may assume that b 6 ≥ 1 (when b 6 = 0, the bounds in the lemma follow immediately from Lemma 43). Recall that the quantity κ (5) ( y) is given by 1], we have the crude bound 0 ≤ A ≤ 1. By Lemma 41, we have that where the values of the constants τ B ′ ,0 are given by equation (99). Using that δ b 2 ≤ 1, it follows that We next perform a transformation on the variableŷ 5 (similar to the one used in the proof of Lemma 41), namely, we set v 5 := (1 −ŷ 5 ) 1/5 so that v 5 ∈ [0, 1/2]. From the definition of the function g 5 (cf. equation (94)), we have that It follows that the quantity κ (6) (A,ŷ 5 ) as a function of A, v 5 can now be written as We will show that We use Mathematica's Resolve function, see Section 8.2.8 for details.
The proof of Lemma 38, which was important in proving Lemma 35, is now immediate.
Lemma 38. Let ∆ = 6 and B be a non-negative integer less than or equal to ∆ − 1 = 5.

Simplifying the optimisation using geometric means
In this section, we prove Lemma 36, which we restate here for convenience. Roughly, the lemma bounds the contribution of a w i with w i = w to κ. The main accomplishment here is the significant reduction of the number of variables; initially the contribution is a function of w variables t 1 , . . . , t w . The lemma shows that we can reduce the number of variables to 1 by considering the geometric mean of the t j 's. The challenge here is to deal with the asymmetry caused by the δ terms without introducing too much slack in the argument, especially for small values of w (say w ≤ 4).
Lemma 36. Define the following constants.

Recall that the function h(t) is given by
where χ = 1/2, ψ = 13/10. We begin with the following lemma.
For convenience, we use Mathematica's Resolve function, see Section 8.2.9 for the code.
For the bounds on K (w) δ stated in the lemma for w = 2, 3, 4 we will have to work harder. Our goal is to prove the following inequalities for t i ∈ [0, 1/2] (i = 1, 2, . . .): To prove these, we will need the following inequalities.
Lemma 46. Let A 1 , A 2 > 0 be real numbers. There exists A > 0 such that for t 1 , t 2 ∈ [0, 1/2], it holds that In particular, inequality (113) holds for the following values of A 1 , A 2 , A: where K Proof. The existence of such an A follows by standard continuity and compactness arguments. The positivity of A is also easy to prove. We thus focus on the more intricate task of verifying (113) for the values of A 1 , A 2 , A given in the statement of the lemma. We will use Mathematica's Resolve function. To do this, we first need to rationalize the expressions which can be achieved for χ = 1/2. In particular, we will use the transformations Under these transformations, for χ = 1/2, we obtain that We are quite close to rationalizing the desired inequality, we only have to address the rationalization of Unfortunately, we will have to explicitly eradicate the radical for this expression.
In particular, inequality (113) is equivalent to Inequality (120) will follow from the following inequalities: and Note that (121) allows us to take square roots in (122), and thus (120) follows. It remains to prove (121) and (122). Using the substitutions (118) and (119), inequalities (121) and (122) are equivalent to and respectively. The last inequalities involve rational expressions and can be resolved using Mathematica for the values of A 1 , A 2 , A given in the statement of the lemma, the code can be found in Section 8.2.10.
To prove (111), we will use the following inequality Applying (125) with t 4 = 3 √ t 1 t 2 t 3 (note that with this value of t 4 it holds that which proves (111). It remains to prove (125), which follows by adding the following inequalities: 2h Inequality (126) is an immediate consequence of Lemma 45. Inequality (127) is an immediate consequence of inequality (110) (multiplied by 1/δ 5 ). For inequality (128), we use the transformations u 1 = √ t 1 t 4 and u 2 = √ t 2 t 3 , so that we need to show for u 1 , u 2 ∈ [0, 1/2], which follows from Lemma 46 (cf. the values (115)). Finally, we conclude with the proof of inequality (112). This is obtained by adding the following three inequalities: Inequality (130) is an immediate consequence of (110) (again, multiplied by 1/δ 5 ). Inequality (131) follows from Lemma 46 (cf. the values (116)). Finally, inequality (132) can be proved using an analogous transformation as the one used to prove (128); the required analogue of inequality (129) has been proved in Lemma 46 (cf. the values (117)). This concludes the proof of Lemma 36.

Eliminating large arity clauses from consideration
In this section, we prove Lemma 37, which we restate here for convenience. Recall that in the construction of the optimisation problem from the original correlation-decay argument, w i is the arity of the i-th clause containing x minus one. Intuitively, clauses with large arity should not affect significantly the correlation decay. The following lemma captures this in a quantitative way which is sufficient for our needs (for clauses with w i ≥ 6).
We start with the verification of (133). Note that (133) holds at equality fort i = 1/2, so it suffices to show that for all integer w ≥ 6, the function is increasing for t ∈ [0, 1/2]. For w = 6, there is nothing to show, so we may assume that w ≥ 7. We then calculate (see Section 8.2.2 for the calculation) so we only need to show that p(t) ≥ 0 for t ∈ [0, 1/2]. Note that p ′ (t) = 6w(t w−1 − t 6 ) ≤ 0 for all t ∈ [0, 1/2] since w ≥ 7. It follows that where in the last inequality we again used that w ≥ 7. This completes the verification of (133). We next verify (134). For convenience, we use Mathematica's Resolve function for that, see Section 8.2.2.
Finally, we verify (135). Since l w ≤ log 6 (w + 1) + 1 and α = 1 − 10 −4 < 1, it suffices to show that for all w ≥ 6 it holds that It is a matter of numerical calculations to show that α −1 ≤ exp (11 · 10 −5 ). Thus, to show (137), it suffices to show that We view the lhs in (138) as a function of w, say f (w). We will prove that from which inequality (138) follows. The first inequality in (139) follows by a numerical calculation, see Section 8.2.2 for details. For the second inequality in (139), we have Note that 2 w −1 2 w+1 −1 ≤ 1/2 and hence for w ≥ 6, we have the bound where the last inequality follows by a numerical calculation, see Section 8.2.2 for details. This concludes the proof of (139) and thus the proof of Lemma 37.

The contribution of arity 3 clauses
In this section, we give the proof of Lemma 40. Roughly, the lemma bounds the aggregate contribution of arity 3 clauses along with the effect of the creation of arity 2 clauses (due to the pinnings when processing arity 3 clauses). This was used to further reduce the number of variables.
Lemma 40. Let ∆ = 6 and b 3 be a non-negative integer less than or equal to ∆ − 1 = 5. There exists a constant C In particular, we will show that the inequality holds with C δ : δ = 102/100, C δ = 104/100, C Note that for b 3 = 1, . . . , 5 it holds that Using this and q = p/(p − 1), we obtain that

Hardness for Approximate Counting
In this section, we prove the hardness results stated in the introduction for the problem of counting independent sets in hypergraphs and the problem of counting dominating sets in graphs.

Counting independent sets in hypergraphs
In this section, we prove inapproximability results for the #HyperIndSet(k, ∆) problem. For this section, it will be convenient to return to the original hypergraph independent set formulation of the problem (instead of the monotone CNF formulation). The proof is via a reduction to the independent set model on graphs which was used by Bordewich et al. [2]. The precise inapproximability results for the hard-core model had not yet been proved at the time [2] was written, so we carry out the details explicitly to obtain the bound that their reduction gives. Namely, we will use the inapproximability result of Sly and Sun [18] for the hard-core model. We first remind the reader the relevant definitions. Let λ > 0. For a graph G = (V, E), the hard-core model with parameter λ is a probability distribution over the set of independent sets of G; each independent set I of G has weight proportional to λ |I| . The normalizing factor of this distribution is the partition function Z G (λ), formally defined as Z G (λ) := I λ |I| where the sum ranges over all independent sets I of G.
Note that λ > λ c (∆) where λ c (∆) is as in Theorem 48. For convenience, let k ′ := ⌈k/2⌉ in what follows. Let G = (V, E) be a ∆-regular graph and set n := |V |. We will construct a (2k ′ )-uniform hypergraph H = (U, F) with maximum degree ∆ such that |U | = k ′ |V |, |F| = |E| and Note that the size of H is larger than the size of G only by a constant factor. It thus follows that if we could approximate #HyperIndSet(k, ∆) within an arbitrarily small exponential factor, we could also approximate Z G (λ) within an (arbitrarily small) exponential factor for all ∆-regular graphs G, contradicting Theorem 48. It remains to construct the hypergraph H = (U, F). Let In words, every vertex v of G maps to a (distinct) set of k ′ vertices in H, the set {u v,1 , . . . , u v,k ′ }, which we will henceforth denote as S v . Further, each edge (v, w) in G maps to a hyperedge in H which is given by S v ∪ S w . It is clear from the construction that every vertex of H has degree ∆ (since G is a ∆-regular graph) and, further, that every hyperedge of H has arity 2k ′ ≥ k. Also, note that |U | = k ′ |V | and |F| = |E|. We complete the proof by showing (143). To do this, we will map independent sets of the hypergraph H to independent sets of the graph G as follows. Let I H be an independent set of H. Define I G to be the subset of vertices of G such that v ∈ I G iff S v ∩ I H = S v . It is immediate that I G is an independent set of G. In fact, it is not hard to see that for every independent set I G of G there are exactly (2 k ′ − 1) n−|I G | independent sets of H that map to I G . From this, (143) follows, thus completing the proof.
The following corollary is a crude estimate of the range of ∆ in which #HyperIndSet(k, ∆) is hard to approximate (by applying Theorem 49).
Proof. For ∆ ≥ 5 · 2 k/2 , we have that The first inequality follows from the fact that 1 − 1 is an increasing function of ∆ and the (trivial) absolute bound ∆ ≥ 6 (since k ≥ 2). The second inequality follows from the fact that ∆ ≥ 5 · 2 k/2 and k ≥ 2. Finally, the last inequality is trivial. Now apply Theorem 49.

Counting dominating sets in graphs
In this section, we prove inapproximability results for the problem of counting dominating sets in graphs of maximum degree ∆. In contrast to Corollary 4 where we showed algorithmic results for ∆-regular graphs, here we consider graphs which are not necessarily regular but only have bounded maximum degree. Formally, we are interested in the following problem.
Instance A graph G with maximum degree at most ∆.
Output The number of dominating sets in G.
For unbounded degree graphs, it was shown by Goldberg,Gysel and Lapinskas [8,Theorem 4] that it is #SAT-hard to approximate the number of dominating sets. We refine this result in a bounded degree setting. More precisely, we show the following.
To prove Theorem 52, we will utilise inapproximability results for the partition function of antiferromagnetic 2-spin system on graphs. We give a quick overview of the relevant definitions and results, following [6,11]. A 2-spin system on a graph is specified by three parameters β, γ ≥ 0 and λ > 0. For a graph G = (V, E), configurations of the system are all possible assignments σ : V → {0, 1} and the partition function is given by with the convention that 0 0 ≡ 1 when one of the parameters β, γ is equal to zero. The case β = γ corresponds to the Ising model, while the case β = 0 and γ = 1 corresponds to the hard-core model (which we already encountered in Section 6.1).
To apply Theorem 53, we will need the following characterisation of the uniqueness regime on the infinite ∆-regular tree (see, e.g., [11,Lemma 21] or [6,Section 3] for more details). For a 2-spin system with parameters β, γ, λ, non-uniqueness on the infinite ∆-regular tree holds iff the system of equations 8 [18,Theorems 2 & 3] are about the hard-core and the antiferromagnetic Ising model on ∆-regular graphs. It is standard to derive from those Theorem 53 (which applies to general antiferromagnetic 2-spin systems), since it is well-known (see, e.g., [18]) that antiferromagnetic 2-spin systems on ∆-regular graphs can be expressed in terms of either the Ising model or the hard-core model. The detailed derivation can be found in [6,Corollary 21].
has multiple (i.e., more than one) positive solutions (x, y).
We are now ready to give the proof of Theorem 52.
In Section 8.3.1, we use Mathematica to find that (144) has multiple positive solutions (x, y). Thus, we have that the 2-spin system is in the non-uniqueness regime of the infinite ∆-regular tree and hence Theorem 53 applies. Let G = (V, E) be a ∆-regular graph for which we want to compute Z G (β, γ, λ). Set n := |V | and m := |E|. We will construct a graph G ′ = (V ′ , E ′ ) with maximum degree ∆ ′ = 18 such that |V ′ | = 2n + m, |E ′ | = |2m + n| and where #DomSets(G ′ ) denotes the number of dominating sets of G ′ . Note that the size of G ′ is larger than the size of G only by a constant factor. It thus follows that if we could approximate #DomSet(∆ ′ ) within an arbitrarily small exponential factor, we could also approximate Z G (β, γ, λ) within an (arbitrarily small) exponential factor for all ∆-regular graphs G, contradicting Theorem 53. It remains to construct the graph G ′ = (V ′ , E ′ ). The graph G ′ is obtained as follows from G. Denote V = {v 1 , . . . , v n } and E = {e 1 , . . . , e m }. For each vertex v i ∈ V , add a new vertex u i and connect it to v i . Further, for each edge e t = (v i , v j ) add a new vertex w t , connect it to both v i and v j and delete the edge e t . In particular, we have that V ′ = {v 1 , . . . , v n } {u 1 , . . . , u n } {w 1 , . . . , w m }, It is clear from the construction that every vertex of H has maximum degree at most ∆ ′ = ∆ + 1 (since G is a ∆-regular graph, in G ′ each of v 1 , . . . , v n has degree ∆ ′ , each of u 1 , . . . , u n has degree 1 and each of w 1 , . . . , w m has degree 2). Further, it is clear that |V ′ | = 2n + m and |E ′ | = 2m + n.
We complete the proof by showing (145). To do this, we will map dominating sets of G ′ to configurations σ : V → {0, 1}. In particular, let S be a dominating set of G ′ . For a vertex v ∈ V , we set σ(v) = 1 iff v ∈ S. Then, it remains to observe that for every σ : V → {0, 1}, there are exactly (146) dominating sets of the graph G ′ that map to σ. To see this, fix σ : V → {0, 1}. We will consider the possibilities for a dominating set S of G ′ that maps to σ. For every vertex v i ∈ V , if σ(v i ) = 1, we have that v i ∈ S and hence the vertex u i can either belong to S or not belong to S (2 choices). In contrast, if σ(v i ) = 0, we have that v i / ∈ S and hence the vertex u i must belong to S in order to be dominated since its only neighbour is vertex v i (1 choice). Similarly, for every e t = (v i , v j ) ∈ E, if σ(v i ) = σ(v j ) = 0, then v i , v j / ∈ S and hence the vertex w t must belong to S in order to be dominated since its only neighbours are the vertices v i , v j (1 choice). In all other cases, w t can either belong to S or not belong to S (2 choices). This justifies (146), thus completing the justification of (145). Note that the purpose of the "bristle" vertices u 1 , . . . , u n is to make the interactions of the edges of G ′ independent of each other. If G has edges e t = (v i , v j ) and e t ′ = (v i , v j ′ ) and amongst {v i , w t , v j }, only v j ∈ S then u i has to be in S, so w t ′ can either be in S or not, independently of w t and v j .
This concludes the proof.

The Uniqueness Threshold on the Infinite Hypertree
We denote by T k,∆ the infinite (∆ − 1)-ary k-uniform hypertree with root vertex ρ. Also, for n = 0, 1, 2, . . ., denote by T k,∆ (n) the subtree of T k,∆ obtained by the first n levels, i.e., T k,∆ (n) is the tree induced by the set of vertices at distance ≤ n from ρ in T k,∆ . We denote by V n the vertex set of T k,∆ (n) and by L n the leaves of the tree, i.e., vertices with degree 1 in T k,∆ (n). Denote by µ n the Gibbs distribution of the independent set model on T k,∆ (n) (see Section 2.1). For a configuration σ : V n → {0, 1}, we denote by σ Ln the restriction of σ to the set L n and by σ ρ the spin of the root ρ.
Definition 54. Let k ≥ 2, ∆ ≥ 2 be integers. The independent set model has uniqueness on We will use σ Ln = 1 to denote that, in the configuration σ, all vertices in L n are assigned the spin 1. For n = 0, 1, 2, . . ., define p n = µ n (σ ρ = 1 | σ Ln = 1) When σ ρ = 1, in each of the ∆ − 1 hyperedges that include ρ, at least one of the k − 1 vertices (other than ρ) must have spin 0. When σ ρ = 0, any configuration on the neighbours of ρ is allowed. By considering the (normalised) weight of such configurations on T k,∆ (n + 1)\ρ, it is not hard to see that the sequence p n satisfies the following recursion for every integer n ≥ 0: For any configuration η : L n → {0, 1}, we will see that µ n (σ ρ = 1 | σ Ln = η) is sandwiched between p n and p n+1 . This yields the following.
It follows that the conditions in (147) and (150) are equivalent, which yields the statement in the lemma. We next show (151). The proof is by induction on n. The claim is trivial for n = 0 since p 0 = 1 and p 1 = 0. So assume that the claim holds for all non-negative integers less than n, we will show it for n. For i ∈ [d] and j ∈ [k − 1], let T i,j be the subtree of T k,∆ (n) rooted at v i,j . Denote by S i,j the leaves of T i,j and by η i,j the restriction of η on S i,j . Let µ T i,j be the Gibbs distribution of T i,j in the independent set model. Note that ∪ i∈[d],j∈[k−1] S i,j = L n . Finally, let q i,j := µ T i,j (σ v i,j = 1 | σ S i,j = η i,j ), q := µ n (σ ρ = 1 | σ Ln = η).
It is simple to see that , or equivalently that For i ∈ [d] and j ∈ [k − 1], note that T i,j is isomorphic to T k,∆ (n − 1) and hence we can use the induction hypothesis to bound q i,j . Let us consider first the case where n is odd. Then, we have that p n ≤ q i,j ≤ p n−1 .
It follows from (152) and (153) that where to derive the last inequality we used that the sequence p n satisfies the recursion in (149) and that the function x 1−x is increasing in x. The proof for odd n is completely analogous, modulo that the inequalities in (153) and (154) hold in the opposite direction.
This concludes the induction step and hence the proof of (151). The proof of the lemma is thus complete.
Proof. To see that the function f is decreasing, we calculate which clearly shows that f is decreasing for z ∈ [0, 1]. Note that f ′ (z) = 0 iff z = 0 or z = 1, so in fact f is strictly decreasing over the interval [0,1]. We next show the second part of the lemma. We can rewrite z = f (z) as For the function g, we have that g(0) = −1, g(1) = 1 and g is continuous on [0,1]. It thus follows that there exists x such that g(x) = 0 which implies that f (x) = x. We next prove that x is unique, i.e., for all z = x it holds that g(z) = 0. For this, it suffices to show that g is increasing on [0, 1] or that g ′ (z) > 0 for all z ∈ [0, 1]. We calculate which clearly shows that g ′ (z) ≥ 1 for all z ∈ [0, 1]. Finally, to see the expression for |f ′ (x)|, just use (157) and use that to simplify. This proves the second assertion in the lemma.
If |f ′ (x)| < 1, the independent set model has uniqueness on T k,∆ . If |f ′ (x)| > 1, the independent set model has non-uniqueness on T k,∆ .
Proof. Recall the sequence p n defined in (149). Let p + n = p 2n and p − n = p 2n+1 . As a consequence of the fact that f is decreasing (cf. Lemma 56) and p + 0 = 1, p − 0 = 0, we have that where p + , p − are real numbers in [0,1]. To see the existence of these limits, note that p + 0 = 1 ≥ p + 1 and p − 0 = 0 ≤ p − 1 . Since p ± n+1 = f (f (p ± n )), a simple induction shows that p + n is a decreasing sequence and p − n increasing. Since both sequences are bounded, we obtain the existence of the limits in (162). For later use, we remark here that the continuity of f and the recursions p ± n+1 = f (p ∓ n ) and p ± n+1 = f (f (p ± n )) imply that p + , p − satisfy the equalities p + = f (p − ) and p − = f (p + ), p + = f (f (p + )) and p − = f (f (p − )).
As a consequence of the existence of the limits in (162), we can conclude that the condition lim sup n→∞ |p n+1 − p n | = 0 is equivalent to We are now ready to show the equivalence in the lemma. For the first part, assume that |f ′ (x)| < 1. To show that uniqueness holds on T k,∆ , it suffices to show that (165) holds, i.e., p + = p − . We have that p + , p − satisfy (164). From the second part of Lemma 56 and the assumption |f ′ (x)| < 1, we thus obtain that p + = x = p − , as wanted.
For the second part, it suffices to show the contrapositive. So, assume that uniqueness holds on T k,∆ , we will show that |f ′ (x)| ≤ 1 where recall that x is specified by the relation x = f (x) (cf. the second part of Lemma 56). From Lemma 55 we have that lim sup n→∞ |p n+1 − p n | = 0 and hence by our previous arguments, we obtain that (165) also holds. From (163) and using the uniqueness of x (cf. Lemma 56), we obtain that the common value of p + and p − is x and thus p n → x.
By the Mean Value theorem we have that there exists ξ n between p n+1 and p n such that |f ′ (ξ n )| = |p n+1 − p n |/|p n − p n−1 |. From p n → x, we also have that ξ n → x. Since p n → x, we have that for infinitely many n it holds that |p n+1 − p n | ≤ |p n − p n−1 |. For all such n, it holds that |f ′ (ξ n )| ≤ 1. Using that f ′ is continuous and ξ n → x, we obtain that |f ′ (x)| ≤ 1 as desired.
This concludes the proof of Lemma 57.
The following lemma establishes the intuitive fact that, for the independent set model, uniqueness on T k,∆ is a monotone property with respect to ∆.
Lemma 58. Let k ≥ 2 be an integer. There exists ∆ c (k) ≥ 3 such that the following holds for all integer ∆ ≥ 2. The independent set model has uniqueness on T k,∆ whenever ∆ < ∆ c (k) and non-uniqueness whenever ∆ > ∆ c (k).
Now, for the sake of contradiction, assume that there was ε ′ > 0 such that x k ≤ (1/2) − ε ′ for infinitely many k. For all such k, we would have that g(x ± k ) ≥ g((1/2) − ε ′ ) and thus, by taking the lim sup in (172), we obtain contradiction. This proves that x ± k → 1/2 as k → ∞, thus concluding the proof of Lemma 60.