Belief Propagation on replica symmetric random factor graph models

According to physics predictions, the free energy of random factor graph models that satisfy a certain"static replica symmetry"condition can be calculated via the Belief Propagation message passing scheme [Krzakala et al., PNAS 2007]. Here we prove this conjecture for two general classes of random factor graph models, namely Poisson random factor graphs and random regular factor graphs. Specifically, we show that the messages constructed just as in the case of acyclic factor graphs asymptotically satisfy the Belief Propagation equations and that the free energy density is given by the Bethe free energy formula.


Belief Propagation
Factor graph models are ubiquitous in statistical physics, computer science and combinatorics [19,29]. Formally, a factor graph G = (V (G), F (G), ∂ G , (ψ a ) a∈F (G) ) consists of a finite set V (G) of variable nodes, a set F (G) of constraint nodes and a function ∂ G : F (G) → l ≥0 V (G) l that assigns each constraint node a ∈ F (G) a finite sequence ∂a = ∂ G a of variable nodes, whose length is denoted by d (a) = d G (a). Additionally, there is a finite set Ω of spins and each constraint node a ∈ F comes with a weight function ψ a : Ω d (a) → (0, ∞). The factor graph gives 1 rise to a probability distribution µ G , the Gibbs measure, on the set Ω V (G) . Indeed, letting σ(x 1 , . . . , x k ) = (σ(x 1 ), . . . , σ(x k )) for σ ∈ Ω V (G) and x 1 , . . . , x k ∈ V (G), we define is the partition function. Moreover, G induces a bipartite graph on V (G) ∪ F (G) in which the constraint node a is adjacent to the variable nodes that appear in the sequence ∂a. By (slight) abuse of notation we just write ∂a = ∂ G a for the set of such variable nodes. Conversely, for x ∈ V (G) we let ∂x = ∂ G x be the set of all a ∈ F (G) such that x ∈ ∂a and we let d (x) = d G (x) = |∂x|. (However, we keep in mind that the order of the neighbors of a matters, unless the weight function ψ a is permutation invariant.) The Potts model on a finite lattice is an example of a factor graph model. In this case the lattice points correspond to the variable nodes and each edge {x, y} of the lattice gives rise to a constraint node a. The spins are Ω = {1, . . . , q} for some integer q ≥ 2. Moreover, all constraint nodes have the same weight function, namely Ω 2 → (0, ∞), (s, t ) → exp(β1{s = t }), where β is a real parameter.
Another example is the k-SAT model for some k ≥ 2. The variable nodes x 1 , . . . , x n correspond to Boolean variables and the constraint nodes a 1 , . . . , a m to k-clauses. The set of possible spins is Ω = {±1} and each constraint node comes with a k-tuple s i = (s i 1 , . . . , s i k ) ∈ {±1} k . The weight function is ψ a i : {±1} k → (0, ∞), σ → exp(−β1{σ = s i }), where β > 0 is a real parameter. Combinatorially, ±1 represent the Boolean values 'true' and 'false' and a i is a propositional clause on the variables ∂a i whose j th variable is negated iff s i j = −1.
A key problem associated with a factor graph model is to analytically or algorithmically calculate the "free energy" ln Z G . Either way, this is notoriously difficult in general [24]. But in the (very) special case that G, viz. the associated bipartite graph, is acyclic it is well known that this problem can be solved via the Belief Propagation equations (see eg. [21, ch. 14]). More precisely, for a variable node x and a constraint node a such that x ∈ ∂a let µ G,x→a be the marginal of x with respect to the Gibbs measure of the factor graph G − a obtained from G by deleting the constraint node a. (To be explicit, µ G,x→a (σ) is the probability that x is assigned the spin σ ∈ Ω in a random configuration σ ∈ Ω V (G) drawn from µ G−a .) Similarly, let µ G,a→x be the marginal of x in the factor graph obtained from G by deleting all constraint nodes b ∈ ∂x \ a. We call µ G,x→a the message from x to a and conversely µ G,a→x the message from a to x. If G is acyclic, then for all x ∈ V (G), a ∈ ∂x, σ ∈ Ω, µ G,x→a (σ) = b∈∂x µ G,b→x (σ) τ∈Ω b∈∂x µ G,b→x (τ) , (1.2) µ G,a→x (σ) = τ∈Ω ∂a 1{τ(x) = σ}ψ a (τ) y∈∂a\x µ G,y→a (τ(y)) τ∈Ω ∂a ψ a (τ) y∈∂a\x µ G,y→a (τ(y)) (1.3) and the messages µ G,x→a , µ G,a→x are the unique solution to (1.2), (1.3). In fact, the messages can be computed via a fixed point iteration and the number of iterations steps required is bounded by the diameter of G. Furthermore, ln Z G is equal to the Bethe free energy, defined in terms of the messages as (The denominators in (1.2) and (1.3) and the arguments of the logarithms in the Bethe free energy are guaranteed to be positive because we assume that the weight functions ψ a take strictly positive values.)

An example
Consider the following concrete example of the Belief Propagation equations and the Bethe free energy. Let G be the star graph with center vertex x 0 and leaves x 1 , x 2 , x 3 , and consider the 2-color Potts model (that is, the Ising model) on G. The factor graph associated to the model consists of the four variable nodes x 0 , . . . x 3 , and three constraint nodes, a 1 , a 2 , a 3 , with a i joining x 0 to x i , and each constraint node given the Potts constraint function ψ(s, t ) = exp(β1{s = t }) for s, t ∈ Ω = {1, 2}. The partition function is 3 .
By symmetry, all of the messages µ G,x i →a i and µ G,a i →x i are simply the uniform measure on 2-colors, (1/2, 1/2). These messages can be easily checked to satisfy the Belief propagation equations (1.2) and (1.3).
Plugging these messages into the Bethe free energy formula yields: = ln Z G , 3 as expected since G is acyclic. The remainder of the paper explores under what conditions the Belief propagation equations and Bethe free energy formula can be expected to hold approximately in factor graphs that are not acyclic.

Random factor graphs
The present paper is about Gibbs distributions arising from random models of factor graphs. Such models are of substantial interest in combinatorics, computer science and information theory [1,29]. The following setup encompasses a reasonably wide class of models. Let Ω be a finite set of 'spins', let k ≥ 3 be an integer, let Ψ = be a finite set of functions ψ : Ω k → (0, ∞) and let ρ = (ρ ψ ) ψ∈Ψ be a probability distribution on Ψ. Then for an integer n > 0 and a real d > 0 we define the "Poisson" random factor graph G n = G n (d , Ω, k, Ψ, ρ) as follows. The set of variable nodes is V (G n ) = {x 1 , . . . , x n } and the set of constraint nodes is F (G n ) = {a 1 , . . . , a m }, where m is a Poisson random variable with mean d n/k. Furthermore, independently for each i = 1, . . . , m a weight function ψ a i ∈ Ψ is chosen from the distribution ρ. Finally, ∂a i ∈ {x 1 , . . . , x n } k is a uniformly random k-tuple of variables, chosen independently for each i . For fixed d , Ω, k, Ψ, ρ the random factor graph G n has a property A asymptotically almost surely ('a.a.s.') if lim n→∞ P [G n ∈ A ] = 1. A well known concrete example is the random k-SAT model for k ≥ 2, where we let Ω = {±1} and Ψ = {ψ (s) : s ∈ {±1} k } with ψ (s) : σ ∈ {±1} k → exp(−β1{σ = s}) and ρ is the uniform distribution on Ψ. Further prominent examples include the Ising and the Potts models on the Erdős-Rényi random graph [11,12].
As in the general case, it is a fundamental challenge is to get a handle on the free energy ln Z G n . To this end, physicists have proposed the ingenious albeit non-rigorous "cavity method" [22]. The simplest version of this approach, the replica symmetric ansatz, basically treats the random factor graph as though it were acyclic. In particular, the replica symmetric ansatz holds that the "messages" µ G n ,x→a , µ G n ,a→x , defined just as in the tree case as the marginals of the factor graph obtained by removing a resp. ∂x \ a, satisfy the Belief Propagation equations (1.2), at least asymptotically as n → ∞. Moreover, the replica symmetric prediction as to the free energy is nothing but the Bethe free energy B G n . If so, then Belief Propagation can not just be used as an analytic tool, but potentially also as an efficient "message passing algorithm" [18]. Indeed, the Belief Propagation fixed point iteration has been used algorithmically with considerable empirical success [17].
Under what assumptions can we vindicate the replica symmetric ansatz? Let us write µ G,x for the marginal of a variable node x under µ G . Moreover, write µ G,x,y for the joint distribution of two variable nodes x, y and let · TV 4 denote the total variation norm. Then expresses that a.a.s. the spins of two randomly chosen variable nodes are asymptotically independent. An important conjecture holds that (1.4) is sufficient for the success of Belief Propagation and the Bethe formula [18].
The main result of this paper proves this conjecture. For a given factor graph G we call the family of messages µ G, If (1.2) holds exactly, then the Bethe free energy can be rewritten in terms of the marginals of the variable and constraint nodes [31]. Specifically, write µ G,a for the joint distribution of the variables ∂a and let Once more the fact that all ψ ∈ Ψ are strictly positive ensures that B G is welldefined.

Random regular models
In a second important class of random factor graph models all variable nodes have the same degree d . Thus, with Ω, k, Ψ, ρ as before let G n = G n,reg (d , Ω, k, Ψ, ρ) be the random factor graph with variable nodes x 1 , . . . , x n and constraint nodes a 1 , . . . , a m , m = d n/k , chosen uniformly from the set of all factor graphs G with d G (x i ) ≤ d for all i . As before, the weight functions ψ a i ∈ Ψ are chosen independently from ρ. Clearly, if k divides d n, then all variable nodes have degree d exactly.

REG3
If m i =1 |J i | > d n, then start over from REG1. Otherwise choose G ε n uniformly at random subject to the condition that no variable node has degree greater than d .
A practical method to sample G ε n uniformly at random is via the "configuration model" [16, Chapter 9]: we create d 'clones' of each variable node and |J i | clones of each constraint node a i (keeping the clones ordered), then pick a uniformly random maximum matching between variable node clones and constraint node clones, then collapse the matching to give our random factor graph (x attached to constraint a i if some clone of x is matched with a clone of a i ). Note that m i =1 |J i | has the distribution Bin(X , 1 − ε) where X has distribution k times a Po(d n/k). In particular, its mean is (1 − ε)d n and so a Chernoff bound gives (1.5) Then there is (δ n ) n → 0 such that µ G ε n , · → · is a δ n -Belief Propagation fixed point a.a.s.

Non-reconstruction
In physics jargon factor graph models that satisfy (1.4) resp. (1.6) are called statically replica symmetric. An obvious question is how (1.4) and (1.6) can be established "in practice". One simple sufficient condition is the more geometric notion of non-reconstruction, also known as dynamic replica symmetry in physics. To state it, recall the bipartite graph on the set of variable and constraint nodes that a factor graph induces. This bipartite graph gives rise to a metric on the set of variable and constraint nodes, namely the length of a shortest path. Now, for a factor graph G, a variable node x, an integer ≥ 1 and a configuration σ ∈ Ω V (G) we let ∇ (G, x, σ) be the set of all τ ∈ Ω V (G) such that τ(y) = σ(y) for all y ∈ V (G) whose distance from x exceeds . The random factor graph G n = G n (d , Ω, k, Ψ, ρ) or G n = G ε n,reg (d , Ω, k, Ψ, ρ) has the non-reconstruction property if where the expectation is over the choice of G n . In words, for large enough and n the random factor graph G n has the following property a.a.s. If we pick a variable node x i uniformly at random and if we pick σ randomly from the Gibbs distribution, then the expected difference between the "pure" marginal µ G n ,x i of x i and the marginal of x i in the conditional distribution given that the event ∇ (G n , x i , σ) occurs diminishes. We contrast (1.7) to the much stronger uniqueness property which states that the influence of the worst-case boundary condition on the marginal spin distribution of x i decreases in the limit of large and n.
Non-reconstruction is a sufficient but not a necessary condition for (1.4) and (1.6). For instance, in the random graph coloring problem (1.4) is satisfied in a much wider regime of parameters than (1.7) [8,18,23].

Discussion and related work
The main results of the present paper match the predictions from [18] and thus provide a fairly comprehensive vindication of Belief Propagation. To the extent that Belief Propagation and the Bethe free energy are not expected to be correct if the conditions (1.4) resp. (1.6) are violated [18,21], the present results seem to be best possible.
In combination with Lemma 1.6 the main results facilitate the "practical" use of Belief Propagation to analyze the free energy. For instance, Theorem 1.4 and Corollary 1.5 allow for a substantially simpler derivation of the condensation phase transition in the regular k-SAT model than in the original paper [4]. Although non-trivial it is practically feasible to study Belief Propagation fixed points on random factor graphs; e.g., [4,5].
Additionally, as Theorems 1.1 and 1.4 show that the "correct" messages are an asymptotic Belief Propagation fixed point, these results probably go as far as one can hope for in terms of a generic explanation of the algorithmic success of Belief Propagation. The missing piece in order to actually prove that the Belief Propagation fixed point iteration converges rapidly is basically an analysis of the "basin of attraction". However, this will likely have to depend on the specific model.
We always assume that the weight functions ψ a associated with the constraint nodes are strictly positive. But this is partly out of convenience (to ensure that all the quantities that we work with are well-defined, no questions asked). For instance, it is straightforward to extend the present arguments extend to the hard-core model on independent sets (details omitted).
In an important paper, Dembo and Montanari [11] made progress towards putting the physics predictions on factor graphs, random or not, on a rigorous basis. They proved, inter alia, that a certain "long-range correlation decay" property reminiscent of non-reconstruction is sufficient for the Belief Propagation equations to hold on a certain class of factor graphs whose local neighborhoods converge to trees [11,Theorem 3.14]. Following this, under the assumption of Gibbs uniqueness along an interpolating path in parameter space, Dembo, Montanari, and Sun [13] verified the Bethe free energy formula for locally tree-like factor graphs with a single weight function and constraint nodes of degree 2. Based on these ideas Dembo, Montanari, Sly and Sun [12] verified the Bethe free energy prediction for the ferromagnetic Potts model on regular tree-like graphs at any temperature.
The present paper builds upon the "regularity lemma" for measures on discrete cubes from [3]. In combinatorics, the "regularity method", which developed out of Szemerédi's regularity lemma for graphs [30], has become an indispensable tool. Bapst and Coja-Oghlan [3] adapted Szemerédi's proof to measures on a discrete cube, such as the Gibbs measure of a (random) factor graph, and showed that this result can be combined with the "second moment method" to calculate the free energy under certain assumptions. While these assumptions are (far) more restrictive than our conditions (1.4) and (1.6), [3] deals with more general factor graph models.
Furthermore, inspired by the theory of graph limits [20], Coja-Oghlan, Perkins and Skubch [9] put forward a "limiting theory" for discrete probability measures to go with the regularity concept from [3]. They applied this concept to the Poisson factor graph model from Section 1.2 under the assumption that (1.4) holds and that the Gibbs measure converges in probability to a limiting measure (in the topology constructed in [9]). While these assumptions are stronger and more complicated to state than (1.4), [9] shows that the limiting Gibbs measure induces a "geometric" Gibbs measure on a certain infinite random tree. Moreover, this geometric measure satisfies a certain fixed point relation reminiscent of the Belief Propagation equations.
Additionally, the present paper builds upon ideas from Panchenko's work [26,27,28]. In particular, we follow [26,27,28] in using the Aizenman-Sims-Starr scheme [2] to calculate the free energy. Moreover, although Panchenko only deals with Poisson factor graphs, the idea of percolating the regular factor graph is inspired by his "cavity coordinates" as well as the interpolation argument of Bayati, Gamarnik and Tetali [6]. Other applications of the cavity method to computing the free energy of Gibbs distributions on lattices include [14].
The paper [27] provides a promising approach towards a general formula for the free energy in Poisson random factor graph models. Specifically, [27] yields a variational formula for the free energy under the assumption that the Gibbs measures satisfies a "finite replica symmetry breaking" condition, which is more general than (1.4). Another assumption of [27] is that the weight functions of the factor graph model must satisfy certain "convexity conditions" to facilitate the use of the interpolation method, which is needed to upper-bound the free energy. However, it is conceivable that the interpolation argument is not necessary if (1.4) holds and that Corollary 1.3 could be derived along the lines of [27] (although this is not mention in the paper). In any case, the main point of the present paper is to justify the Belief Propagation equations, which are at very core of the physicists "cavity method" in factor graph models, and to obtain a formula for the free energy in terms of "messages".
Finally, the proof of Lemma 1.6 is a fairly straightforward extension of the proof of [9,Proposition 3.4]. That proof, in turn, is a generalization of an argument from [25]. For more on non-reconstruction thresholds in random factor graph models see [7,10,15,23].

Outline
After introducing some notation and summarizing the results from [3] that we build upon in Section 2, we prove Theorem 1.1 and Corollaries 1.2 and 1.3 in Section 3. Section 4 then deals with Theorem 1.4 and Corollary 1.5. Finally, the short proof of Lemma 1.6 can be found in Section 5.

Preliminaries
For an integer l ≥ 1 we let [l ] = {1, . . . , l }. When using O( · )-notation we refer to the asymptotics as n → ∞ by default. We say two sequences of probability distributions Q n and P n are mutually contiguous if for every sequence of events (1). Throughout the paper we denote by d , Ω, k, Ψ, ρ the parameters of the factor graph models from Section 1. We always assume d , Ω, k, Ψ, ρ remain fixed as n → ∞.
For a finite set X we let P (X ) be the set of all probability measures on X , which we identify with the set of all maps p : X → [0, 1] such that ω∈X p(ω) = 1. If µ ∈ P (X S ) for some finite set S = , then we write τ µ , σ µ , σ µ 1 , σ µ 2 , . . . for independent samples chosen from µ. We omit the superscript where possible. Furthermore, if X : (X S ) l → R is a random variable, then we write for the expectation of X with respect to µ ⊗l . We reserve the symbols E[ · ], P[ · ] for other sources of randomness such as the choice of a random factor graph. Moreover, for a set = U ⊆ S, ω ∈ X and σ ∈ X S we let Thus, σ[ · |U ] = (σ[ω|U ]) ω∈X ∈ P (X ) is the distribution of the spin σ(u) for a uniformly random u ∈ U . Further, for a measure µ ∈ P (X S ) and a sequence We use the "regularity lemma" for discrete probability measures from [3]. Let us fix a finite set X for the rest of this section. If V = (V 1 , . . . ,V l ) is a partition of some set V , then we call #V = l the size of V . Moreover, for ε > 0 we say such that the following is true: HM4 µ is ε-regular with respect to V . A (ε, l )-state of µ is a set S ⊂ X n such that µ(S) > 0 and We call µ (ε, l )-symmetric if the entire cube X n is an (ε, l )-state.

Lemma 2.5.
For any ε > 0 there is ξ > 0 and n 0 > 0 such that for any n > n 0 and the following holds.
> 0 sufficiently small and assume that n is large enough. With (V , S) and j as above set ν = µ[ · |S j ] for brevity.
Finally, we recall the following folklore fact about Poisson random factor graphs.

Poisson factor graphs
Throughout this section we fix (d ,

Proof of Theorem 1.1
We begin with the following lemma that will prove useful in Section 4 as well.
Lemma 3.1. For any integer L > 0 and any α > 0 there exist ε = ε(α, L, Ψ) > 0, n 0 = n 0 (ε, L) such that the following is true. Suppose that G is a factor graph with n > n 0 variable nodes such that ψ a ∈ Ψ * for all a ∈ F (G). Moreover, assume that µ G is (ε, 2)-symmetric. If G + is obtained from G by adding L constraint nodes b 1 , . . . , b L with weight functions ψ b 1 , . . . , ψ b L ∈ Ψ * arbitrarily, then µ G + is (α, 2)symmetric and Proof. Because all functions ψ ∈ Ψ are strictly positive, there exists δ = δ(L, Ψ) > 0 such that for any ψ 1 , . . . , ψ L ∈ Ψ * the following is true. Suppose that ψ i : We claim that µ G [ · |S j ] is ε/δ 2 -regular w.r.t. V for all j ∈ J . Indeed, suppose that µ G + is ε-regular on V i and let U ⊂ V i be a subset of size |U | ≥ ε|V i |. Because G + is obtained from G by adding L constraint nodes, the definition (1.1) of the Gibbs measure and the choice (3.4) of δ ensure that Moreover, by HM2 and the triangle inequality for any j ∈ J we have In combination with Lemma 2.4 and the ε/δ 2 -regularity of µ G [ · |S j ], (3.5) implies that S j is an (ε , 2)-state of µ G for every j ∈ J , provided that ε = ε(ε ) > 0 was chosen small enough. In addition, (3.4) implies that µ G (S j ) ≥ δ 2 ε/N for all j ∈ J . Consequently, Corollary 2.3 and our assumption (1.4) entail that for each provided ε = ε (ε ) > 0 is sufficiently small and n > n 0 is large enough. Further, by Lemma 2.5 and ε/δ 2 -regularity, Hence, by (3.6) and the triangle inequality, Analogously, we obtain from Lemma 2.5 that Combining (3.7) and (3.8) and using the triangle inequality, we obtain Moreover, combining (3.3) and (3.9) and applying the triangle inequality once more, we find Thus, HM4 and Lemma 2.4 imply that µ G + is (α, 2)-symmetric, provided that ε was chosen small enough.

(3.16)
To this end, let U = j ≥2 ∂b j be the set of all variable nodes that occur in the constraint nodes b 2 , . . . , b ∆ . Because µ G ,x n →b 1 is the marginal of x n in the factor graph G − b 1 , the definition (1.1) of the Gibbs measure entails that for any σ ∈ Ω, . (3.17)

Proof of Corollary 1.2
Following Aizenman-Sims-Starr [2] we are going to show that The assertion then follows by summing on n. To prove (3.24) we will couple the random variables Z G n−1 , Z G n by way of a third random factor graphĜ; a similar coupling was used in [9]. Specifically, letĜ be the random factor graph with variable nodes V (Ĝ) = {x 1 , . . . , x n } obtained by includingm = Po(nd /k) independent random constraint nodes, wherê For each constraint node a ofĜ the weight function ψ a is chosen from the distribution ρ independently. Further, set p = ((n−1)/n) k−1 and let G be a random graph obtained fromĜ by deleting each constraint node with probability 1 − p independently. Let A be the (random) set of constraints removed fromĜ to obtain G . In addition, obtain G fromĜ by selecting a variable node x uniformly at random and removing all constraints a ∈ ∂Ĝ x along with x itself. Then G is distributed as G n and G is distributed as G n−1 plus an isolated variable. Thus,
Then the construction of G ensures that Instead of thinking of G as being obtained fromĜ by removing X random constraints, we can think ofĜ as being obtained from G by adding X independent random constraint nodes a 1 , . . . , a X . More precisely, let G 0 = G and G i = G i −1 + a i for i ∈ [X ]. Then given X the triple (G ,Ĝ, A) has the same distribution as (G ,G X , {a 1 , . . . , a X }).
Moreover, because pd n/k = d n/k, G has the same distribution as G n . Therefore, our assumption (1.4) implies that G is (o(1), 2)-symmetric a.a.s. Hence, Lemma 3.1 implies that G i −1 retains (o(1), 2)-symmetry a.a.s. for any 1 ≤ i ≤ min{X , L}. Consequently, Corollary 2.2 implies that G i −1 is (o(1), k)-symmetric a.a.s. Since ∂b i is chosen uniformly and independently of b 1 , . . . , b i −1 , Markov's inequality thus shows that for every 1 ≤ i ≤ min{X , L}, provided n is big enough. Further, since the constraints (a i ) i ∈[X ] are chosen independently and because µĜ ,y→a i (τ(y)) is the marginal in the factor graph without a i , (3.1) and (3.30) imply that P ∀i ∈ [X ] : Hence, with probability at least 1 − 3ε the bound holds for all i ∈ [X ] simultaneously. Further, the definition (1.1) of the partition function entails that for any i ∈ [X ], Thus, if (3.31) holds and if δ is chosen sufficiently small, then Finally, the assertion follows by taking logarithms and summing over i = 1, . . . , X . Proof. Given ε > 0 let L = L(ε) > 0 be a large enough, let γ = γ(ε, L) > δ = δ(γ) > 0 be small enough and assume that n is sufficiently large. Letting X = |∂Ĝ x|, we can pick L large enough so that As in the previous proof, we turn the tables: we think ofĜ as being obtained from G by adding a new variable node x and X independent random constraint nodes a 1 , . . . , a X such that x ∈ ∂a i for all i .

Corollary 3.5. A.a.s. we have ln
Proof. Let a 1 , . . . , a X be the constraint nodes adjacent to x and let U = X i =1 ∂Ĝ a i . With probability 1−O(1/n) for all 1 ≤ i < j ≤ X we have ∂a i ∩∂a j \{x} = . If so, then Hence, Lemma 3.4 entails P ln ZĜ for all i ∈ [X ]. Plugging (3.38) into (3.37), we see that a.a.s.
Combining Lemma 3.3 and Corollary 3.5, we see that a.a.s.Ĝ is such that Moreover, by our assumption and Fact 3.2 the r.h.s. converges to B in probability. Thus, Corollary 1.2 follows by taking the expectation overĜ.

Proof of Corollary 1.3
We begin by deriving formulas for the variable and constraint marginals in terms of the messages.
Lemma 3.6. We have Proof. Proceeding along the lines of the proof of Theorem 1.1, we let G be the random factor graph on x 1 , . . . , x n containing m = Po(d n(1 − 1/n) k /k) random constraint nodes that do not touch x n . Obtain G from G by adding ∆ = Po(d n(1 − (1 − 1/n) k )/k) random constraint nodes b 1 , . . . , b ∆ that contain x n so that G is distributed as G n . Let U = ∆ i =1 ∂b i . In complete analogy to (3.17) we obtain the formula . Hence, with ν i (σ) from (3.20) we see that a.a.s.
Lemma 3.7. We have Proof. Obtain G from G n by adding one single random constraint node a.
Then the distribution of the pair (G , a) is at total variation distance O(1/n) from the distribution of the pair (G n , a), where a is a random constraint node of G n given F (G n ) = . Therefore, it suffices to prove the estimate The assumption (1.4) and Corollary 2.2 imply that a.a.s. µ G n is (o(1), k)-symmetric. Hence, because ∂ G a is random, a.a.s. we have |µ G n ,∂a (σ)− x∈∂a µ G n ,x (σ(x))| = o(1) for all σ ∈ Ω ∂a . Since µ G n ,x = µ G ,x→a for all x ∈ ∂a, this means that a.a.s. Essentially, we will prove Corollary 1.3 by following the steps of the derivation of the corresponding formula for acyclic factor graphs [21,Chapter 14]. We just need to allow for error terms that come in because the right hand sides of (3.6) and (3.7) are o(1) rather than 0 (like in the acyclic case). Specifically, by Lemma 3.7 a.a.s. for all but o(n) constraint nodes a ∈ F (G n ) we have Further, by Lemma 3.6 a.a.s. for all but o(n) variable nodes Hence, Fact 2.6 implies that a.a.s. for all but o(n) constraint nodes a ∈ F (G n ), Thus, Corollary 1.3 follows from Corollary 1.2.
We prove Theorem 1.4 and Corollary 1.5 by adapting the proofs of Theorem 1.1 and Corollary 1.2 to the regular factor graph model. In the proofs in Section 3 we exploited the Poisson nature of the factor graphs to determine the effect of adding or removing a few constraint and/or variable nodes. Here the necessary wiggle room is provided by the "ε-percolation" of the otherwise rigid d -regular model G n . This enables a broadly similar analysis to that of Section 3. However, some of the details are subtle, most notably the coupling required for the Aizenman-Sims-Starr argument in Section 4.2.

(4.2)
Let ∆ = d G ε n (x n ). Then 0 ≤ ∆ ≤ d , and Pr[∆ = 0] = Ω(ε d ) by REG2-REG3. Let G be the random factor graph obtained from G ε n by deleting all constraint nodes a such that x n ∈ ∂a. Then the distribution G is at total variation distance O(1/n) from the distribution of G ε n given that ∂x n = . Therefore, the assumption (1.6) and Corollary 2.2 imply P G fails to be (η, d k)-symmetric ). Our conditioning on a∈F (G ) d G (a) ≤ (1−ε/2)d n ensures that all variable nodes of G have degree at most d a.a.s. Hence, the distribution of G is at total variation distance o(1) of the distribution of G ε n given ∆.
As in the Poisson case we just need to prove the following: with probability If we again let U = j ≥2 ∂b j be the set of all variable nodes joined to constraints b 2 , . . . , b ∆ , then since µ G ,x n →b 1 is the marginal of x n in the factor graph G − b 1 and µ G ,b i →x n is the marginal of x n in G +b i , we obtain the analogous equations to (3.17) and (3.18): (4.8) Further, given that a∈F (G ) d G (a) ≤ (1 − ε/2)d n the distribution q is such that 1/(d |R|) ≤ q(x) ≤ 1/|R|. Hence, q is "within a factor of d " of being uniform. In effect, we can choose η > 0 so small that our assumption that G is (η, d k)symmetric ensures that with probability at least 1 − η 1/3 we have τ∈Ω U 1{∀y ∈ U \ {x n } : σ(y) = τ(y) G − y∈U µ G ,y (τ(y)) < η 1/3 . (4.9) Due to (4.3) and (4.4) we obtain the assertion from (4.7)-(4.9) by following the proof of Theorem 1.1 verbatim from (3.19).

Proof of Corollary 1.5
As in the proof of Corollary 1.2 we couple G ε n+1 and G ε n via a common super-graphĜ obtained as follows. Choosem from the distribution d +Po(d (n +1)/k) conditional on the event that km < d n. Then, chooseĜ with variable nodes x 1 , . . . , x n+1 and constraint nodesâ 1 , . . . ,âm from the distribution G ε n+1 given that |F (G ε n+1 )| =m. Proof. Construct a copy of G ε n+1 by generating m = Po(d (n+1)/k). Conditioned onm = m, the distributions ofĜ and G ε n+1 are identical, and so the claim follows from the contiguity of the two Poisson variables, m andm.
Obtain G fromĜ by removing d random constraint nodes.
Proof. Couple the distributions as follows: Let m = Po(d (n + 1)/k). Choose m constraints with independent random weight functions from Ψ according to ρ, and choose a set of active slots J including each slot with probability 1−ε. Randomly attach the active slots of all constraints to the n + 1 variable nodes uniformly at random conditioned on no variable node having degree more than d . This construction yields G ε n+1 on the event A that the total number of active slots is at most d n. Now add d additional random constraint nodes, with random sets of active slots as above, and attach to variable nodes at random proportion to the deficit of their degrees from d . On the event A , this yields the distributionĜ. Now remove d constraints at random: the constraints remaining are still matched to uniformly random variable nodes, and so the distribution is that of G . This coupling succeeds if A holds, and from a similar Chernoff bound to (1.5) Furthermore, obtain G fromĜ as follows.
• Select a random variable node x ofĜ.
• Remove x and all constraint nodes adjacent to x.
Proof. It is not the case that G is distributed exactly as G ε n : the clauses adjacent to x have a different degree distribution than clauses drawn uniformly fromĜ (for instances, none of them have degree 0). Nevertheless, we will show that the two distributions are close enough that we can use G in the Aizenman-Sims-Starr scheme. We will construct the two factor graphs H , H with variable nodes {x 1 , . . . , x n } on the same probability space simultaneously such that the following properties hold: 1. Up to total variation distance exp(−Ω(ε 2 n)), H is distributed as G and H is distributed as G ε n .
Because the set Ψ of possible weight functions is fixed and all ψ ∈ Ψ are strictly positive, we have ln Z H , ln Z H = O(n) with certainty. For the same reason adding or removing a single constraint can only alter ln Z H , ln Z H by some constant C . Therefore, the assertion is immediate from properties (1)-(3).
To construct the coupling, we will first couple the degree sequences of the constraints of H , H in such a way that with probability 1 − O(ε) the sequences are identical and otherwise they differ in at most 2d places. Formally, letm = d + Po(d (n + 1)/k) and letk = (k 1 , . . . ,km) ∈ {0, 1, . . . , k}m be a vector with the same distribution as the vector (dĜ (â 1 ), . . . , dĜ (âm)) of constraint degrees ofĜ. Then (1.5) implies thatk is distributed as a sequence of independent Bin(k, 1 − ε) variables, up to total variation distance exp(−Ω(ε 2 n)). Further, let X = (X i ) i =0,1,...,k be distributed as the statistics of the degrees of the d constraint nodes deleted fromĜ in the above construction of G given that that dĜ (â j ) =k j for all j ; that is, X i is the number of deleted constraint nodes of degree i . Similarly, let X = (X i ) i =0,1,...,k be the statistics of d elements of the sequencek chosen uniformly without replacement.
Let A be the event that (4.10) Then by REG2 and the Chernoff bound we have P [A ] ≥ 1 − exp(−Ω(ε 2 n)). To couple H , H on the event A we make the following two observations.
• P X k = d |A = 1 − O(ε); for (4.10) implies that the total number of variable nodes adjacent to a constraint node of degree less than d is bounded by 3εkn.
Consequently, on A we can couple X , X such that P[X = X ] = O(ε). If X = X , then we choose D = D ⊂ [m] uniformly at random subject to the condition that i ∈D 1{k i = j } = X j for all j = 0, 1, . . . , k. Otherwise we choose two independent random sets D, D ⊂ [m] with i ∈D 1{k i = j } = X j and i ∈D 1{k i = j } = X j for all j . Further, with (ξ i ) i ≥1 a sequence of Be(1/(n + 1)) random variables that are mutually independent and independent of everything else let   Proof. We mimic the proof of Lemma 3.3. Let η = η(ε) > δ = δ(η) > 0 be small enough and assume that n > n 0 (δ) is sufficiently large. Instead of thinking of G as being obtained fromĜ by removing d random constraints, we can think of G as being obtained from G by adding d random constraint nodes a 1 , . . . , a d . More precisely, let G 0 = G and In addition, (3.1) yields P ∀i ∈ [d ] : Hence, a.a.s. for all i ∈ [d ] simultaneously, µĜ ,a i →y (τ(y)) < δ (4.12) As (4.12) implies that a.a.s.
The assertion follows by taking logarithms and summing. ∂Ĝ x). Then a.a.s.
Proof. Given δ > 0, let γ = γ(ε, δ) > η = η(γ) > 0 be small enough and assume that n > n 0 (γ) is sufficiently large. We can think ofĜ as being obtained from G by adding a new variable node x, X ≤ d random constraint nodes a 1 , . . . , a X such that x ∈ ∂a i for all i and another Y random constraint nodes a X +1 , . . . , a X +Y such that x ∈ ∂a i for i > X . Let U = i ≤X +Y ∂a i . Since G has at least γn variables of degree less than d a.a.s., Claim 4. ψ a i (τ(∂a i )), (4.13) shows that a.a.s.
To complete the proof of Corollary 1.5 we take ε → 0 slowly. We begin with the following observation.

Claim 4.7.
We have 1 n E[ln Z G n ] = 1 n E[ln Z G ε n ] + O(ε). Proof. We recall the following Lipschitz property, which is immediate from (1.1): if a factor graph G is obtained from another factor graph G by adding or removing a single constraint node, then | ln Z G − ln Z G | ≤ C for some fixed number C = C (Ψ). We can couple G n and G ε n by forming G 0 by choosing m = Po((1 − ε) k d n/k) random constraints, joined at random to variable nodes so that no variable node has degree more than d . To form G n from G 0 we add d n/k − m additional random constraints; with probability 1 − e −Ω(ε 2 n) the number of additional constraints is O(εn). To form G ε n from G n , we add Po( k j (1 − ε) j ε k− j d n/k) random constraints with degree j , for j = 1, . . . k − 1. Again with probability 1 − e −Ω(ε 2 n) the total number of additional constraints is O(εn). Applying the Lipschitz property twice gives the claim.
Proof of Theorem 1.5. Let X = |F (G ) F (G )| be the number of constraint nodes in which G ,G differ. As in the previous proof, we know deterministically that | ln Z G − ln Z G | ≤ C X . Moreover, the construction of G ,G ensures that X has a bounded mean. Therefore, Markov's inequality and Lemma 4.6 ensure that E ln(Z G /Z G ) = (n + 1) −1 E[BĜ ] + O(ε). Hence, by (4.11), Claim 4.1 and because ln Z G ε n = O(n) with certainty,