Sampling in Uniqueness from the Potts and Random-Cluster Models on Random Regular Graphs

We consider the problem of sampling from the Potts model on random regular graphs. It is conjectured that sampling is possible when the temperature of the model is in the uniqueness regime of the regular tree, but positive algorithmic results have been for the most part elusive. In this paper, for all integers $q\geq 3$ and $\Delta\geq 3$, we develop algorithms that produce samples within error $o(1)$ from the $q$-state Potts model on random $\Delta$-regular graphs, whenever the temperature is in uniqueness, for both the ferromagnetic and antiferromagnetic cases. The algorithm for the antiferromagnetic Potts model is based on iteratively adding the edges of the graph and resampling a bichromatic class that contains the endpoints of the newly added edge. Key to the algorithm is how to perform the resampling step efficiently since bichromatic classes may induce linear-sized components. To this end, we exploit the tree uniqueness to show that the average growth of bichromatic components is typically small, which allows us to use correlation decay algorithms for the resampling step. While the precise uniqueness threshold on the tree is not known for general values of $q$ and $\Delta$ in the antiferromagnetic case, our algorithm works throughout uniqueness regardless of its value. In the case of the ferromagnetic Potts model, we simplify the algorithm significantly by utilising the random-cluster representation of the model. In particular, we show that a percolation-type algorithm succeeds in sampling from the random-cluster model with parameters $p,q$ on random $\Delta$-regular graphs for all values of $q\geq 1$ and $p<p_c(q,\Delta)$, where $p_c(q,\Delta)$ corresponds to a uniqueness threshold for the model on the $\Delta$-regular tree. When restricted to integer values of $q$, this yields a simplified algorithm for the ferromagnetic Potts model on random $\Delta$-regular graphs.


Introduction
Random constraint satisfaction problems have been thoroughly studied in computer science in an effort to analyse the limits of satisfiability algorithms and understand the structure of hard instances. Analogously, understanding spin systems on random graphs [27,28,34,4,26,6,13,8,9] gives insights about the complexity of counting and the efficiency of approximate sampling algorithms. In this paper, we design approximate sampling algorithms for the Potts model on random regular graphs.
The Potts model is a fundamental spin system studied in statistical physics and computer science. The model has two parameters: an integer q ≥ 3, which represents the number of states/colours of the model, and a real parameter B > 0, which corresponds to the so-called "temperature". We denote the set of colours by [q] := {1, . . . , q}. For a graph G = (V, E), configurations of the model are all possible assignments of colours to the vertices of the graph. Each assignment σ : V → [q] has a weight w G (σ) which is determined by the number m(σ) of monochromatic edges under σ; namely, w G (σ) = B m(σ) . The Gibbs distribution µ G is defined on the space of all configurations σ and is given by We also refer to µ G as the Potts distribution; the quantity Z G is known as the partition function. Well-known models closely related to the Potts model are the Ising and colourings models. The Ising model is the special case q = 2 of the Potts model, while the q-colourings model is the "zerotemperature" case B = 0 of the Potts model, where the distribution is supported on the set of proper q-colourings.
The behaviour of the Potts model has significant differences depending on whether B is less or larger than 1. When B < 1, configurations where most neighbouring vertices have different colours have large weight and the model is called antiferromagnetic; in contrast, when B > 1, configurations where most neighbouring vertices have the same colours have large weight and the model is called ferromagnetic. One difference between the two cases that will be relevant later is that the ferromagnetic Potts model admits a random-cluster representation -the details of this representation are given in Section 2.1.
Sampling from the Potts model is a problem that is frequently encountered in running simulations in statistical physics or inference tasks in computer science. To determine the efficiency and accuracy of sampling methods, it is relevant to consider the underlying phase transitions, which signify abrupt changes in the properties of the Gibbs distribution when the underlying parameter changes. The so-called uniqueness phase transition captures the sensitivity of the state of a vertex to fixing far-away boundary conditions. As an example, in the case of the ferromagnetic Potts model on the ∆-regular tree, uniqueness holds when root-to-leaves correlations in the Potts distribution vanish as the height of the tree goes to infinity; it is known that this holds iff B < B c (q, ∆), where B c (q, ∆) is the "uniqueness threshold" (cf. (2) for its value). Connecting the uniqueness phase transition with the performance of algorithms is a difficult task that is largely under development. This connection is well-understood on the grid, where it is known that the mixing time of local Markov chains, such as the Glauber dynamics, switches from polynomial to exponential at the corresponding uniqueness threshold, see for example [23,22,32,1,24,2].
For random ∆-regular graphs or, more generally, graphs with maximum degree ∆, the uniqueness threshold on the ∆-regular tree becomes relevant. For certain two-state models, such as the ferromagnetic Ising model and the hard-core model, it has been proved that Glauber dynamics mixes rapidly when the underlying parameter is in the uniqueness regime of the regular tree, and that the dynamics mixes slowly otherwise (see [27,28,9]). The same picture is conjectured to 1 hold for the Potts model as well, but this remains open. For the ferromagnetic case in particular, Bordewich, Greenhill, and Patel [4] prove rapid and slow mixing results for Glauber dynamics on random regular graphs and graphs with maximum degree ∆ when the parameter B is within a constant factor from the uniqueness threshold on the regular tree. More generally, there has been significant progress the last years in understanding the complexity of sampling from the Gibbs distribution in two-state systems, but for multi-state systems progress has been slower, especially on the algorithmic side. In this paper, for all integers q ≥ 3 and ∆ ≥ 3, we design approximate sampling algorithms for the q-state Potts model on random ∆-regular graphs (regular graphs with n vertices chosen uniformly at random), when the parameter B lies in the uniqueness regime of the regular tree, for both the ferromagnetic and antiferromagnetic cases. Our algorithms are not based on a Markov chain approach but proceed by iteratively adding the edges of the graph and performing a resampling step at each stage. As such, our algorithms can produce samples that are within error 1/n δ from the Potts distribution for some fixed constant δ > 0 (which depends on B, q, ∆). Remark 1. There are certain "bad" ∆-regular graphs where the algorithms will fail to produce samples with the desired accuracy; saying that the algorithms work on random ∆-regular graphs means that the number of these "bad" graphs with n vertices is a vanishing fraction of all ∆regular graphs with n vertices for large n. Moreover, we can recognise the "good" graphs (where our algorithms will successfully produce samples with the desired accuracy) in polynomial time.
Our approach is inspired by Efthymiou's algorithm [7,8] for sampling q-colourings on G(n, d/n); the algorithm there also proceeds by iteratively adding the edges of the graph and exploits the uniqueness on the tree to show that the sampling error is small. However, for the antiferromagnetic Potts model, the resampling step turns out to be significantly more involved and we need substantial amount of work to ensure that it can be carried out efficiently, as we explain in detail in Section 3. Nevertheless, for the ferromagnetic case, we manage to give a far simpler algorithm by utilising the random-cluster representation of the model (see Section 2.1). In particular, we demonstrate that a percolation-type algorithm succeeds in sampling approximately from the random-cluster model with parameters p, q on random ∆-regular graphs for all values of q ≥ 1 and p < p c (q, ∆), where p c (q, ∆) corresponds to a uniqueness threshold for the model on the ∆-regular tree. When restricted to integer values of q, this yields a simple algorithm for the ferromagnetic Potts model on random ∆-regular graphs.
To conclude this introductory section, we remark that, for many antiferromagnetic spin systems on random graphs, typical configurations in the Gibbs distribution display absence of long-range correlations even beyond the uniqueness threshold, up to the so-called reconstruction threshold [25,14]. Note that uniqueness guarantees the absence of long-range correlations under a "worstcase" boundary, while non-reconstruction only asserts the absence of long-range correlations under "typical" boundaries; it is widely open whether this weaker notion is in fact sufficient for sampling on random graphs. On an analogous note, for the ferromagnetic Potts model on random regular graphs, the structure of typical configurations can be fairly well understood using probabilistic arguments for all temperatures (see, e.g., [6,13]) and it would be very interesting to exploit this structure for the design of sampling algorithms beyond the uniqueness threshold. In this direction, Jenssen, Keevash and Perkins [19] very recently designed such an algorithm for all sufficiently large B that works more generally on expander graphs (see also [17] for similar-flavored results on the grid). 2

Definitions and Main Results
We first review in Section 2.1 the definition of the random-cluster model. In Section 2.2, we state results from the literature about uniqueness on the regular tree for the Potts and random-cluster models. Then, in Section 2.3, we state our algorithmic results for the ferromagnetic Potts and random-cluster models and, in Section 2.4, our result for the antiferromagnetic Potts model.

The random-cluster model
The random-cluster model has two parameters p ∈ [0, 1] and q > 0; note that q in this case can take non-integer values. For a graph G = (V, E), we denote the random-cluster distribution on G by ϕ G ; this distribution is supported on the set of all edge subsets. In particular, for S ⊆ E, let k(S) be the number of connected components in the graph G ′ = (V, S) (isolated vertices do count). Then, the weight of the configuration S is given by w G (S) = p |S| (1 − p) |E\S| q k(S) and Following standard terminology, each edge in S will be called open, while each edge in E\S closed.
For integer values of q, there is a well-known connection between the random-cluster and ferromagnetic Potts model, as detailed below.
• Let S ⊆ E be distributed according to the RC distribution ϕ G with parameters p, q. Consider the configuration σ obtained from S by assigning each component in the graph (V, S) a random colour from [q] independently. Then, σ is distributed according to the Potts distribution µ G with parameter B.
• Conversely, suppose that σ : V → [q] is distributed according to the Potts distribution µ G with parameter B. Consider S ⊆ E obtained by adding to S each monochromatic edge under σ with probability p independently. Then, S is distributed according to the RC distribution ϕ G with parameters p, q.
Lemma 2 allows us to translate our sampling algorithm for the random-cluster model to a sampling algorithm for the ferromagnetic Potts model. The benefit of working with the randomcluster model (instead of the Potts model) is that the random-cluster distribution satisfies certain monotonicity properties (cf. Lemma 19) which simplifies significantly the analysis of the algorithm.

Uniqueness for Potts and random-cluster models on the tree
In this section, we review uniqueness on the tree for the Potts and random-cluster models. We start with the Potts model.
For a configuration σ and a set U , we denote by σ U the restriction of σ to the set U ; in the case of a single vertex u, we simply write σ u to denote the colour of u. Denote by T ∆ the infinite (∆ − 1)-ary tree with root vertex ρ and, for an integer h ≥ 0, denote by T h the subtree of T ∆ induced by the vertices at distance ≤ h from ρ. Let L h be the set of leaves of T h . Definition 3. Let B > 0 and q, ∆ ≥ 3 be integers. The q-state Potts model with parameter B > 0 has uniqueness on the infinite (∆ − 1)-ary tree if, for all colours c ∈ [q], it holds that It has non-uniqueness otherwise.
For the ferromagnetic q-state Potts model (B > 1), it is known that uniqueness holds on the For the antiferromagnetic Potts model (B < 1), the uniqueness threshold on the tree is not yet known in full generality. It is a folklore conjecture that the model has uniqueness iff q ≥ ∆ and B ∈ (0, 1), or q < ∆ and B ≥ ∆−q ∆ . It is known that the model has non-uniqueness when B < ∆−q ∆ [12]. Establishing the uniqueness side of the conjecture is more difficult; this has been established recently in [11] for small values of q and ∆. In the case q = 3, [11] also established the uniqueness threshold for all ∆: for ∆ ≥ 4, uniqueness holds iff B ∈ [(∆ − 3)/∆, 1) and, for ∆ = 3 uniqueness holds iff B ∈ (0, 1). For the q-colourings model (B = 0), Jonasson [21], building on work of Brightwell and Winkler [5], established that the model has uniqueness iff q > ∆.
Remark 4. To summarise the above, a necessary condition for uniqueness on T ∆ in the antiferromagnetic q-state Potts model with parameter B ∈ (0, 1) is that B ≥ (∆−q)/∆. It is also conjectured that this condition is sufficient but this has only been established for q = 3.
Uniqueness for the random-cluster model on the tree is less straightforward to define. Häggström [16] studied uniqueness of random-cluster measures on the infinite (∆−1)-ary tree where all infinite components are connected "at infinity" -we review his results in more detail in Section 7.1. He showed that, for all q ≥ 1, a sufficient condition for uniqueness is that p < p c (q, ∆), where the critical value p c (q, ∆) is given by 1 : Note that the critical values in (2) and (3) are connected for integer values of q via p c (q, ∆) = 1 − 1 Bc(q,∆) . Häggström [16] also conjectured that uniqueness for the random-cluster model holds on T ∆ when p > q q+∆−2 for all q ≥ 1; this remains open but progress has been made in [20].
We should note here that the bounds appearing in Remarks 4 and 5 will be useful to simplify some of our arguments. However, the stronger assumption that the parameters are in the uniqueness region of the ∆-regular tree are crucial for the analysis of our algorithm and, in particular, the proofs of the upcoming Lemmas 18 and Lemmas 27 hinge on this assumption. 1 In [16], pc(q, ∆) is defined in a different way, but the two definitions are equivalent for all q ≥ 1.

Sampling ferro Potts and random-cluster models on random regular graphs
We begin by stating our result for the random-cluster model on random regular graphs. Theorem 6. Let ∆ ≥ 3, q ≥ 1 and p < p c (q, ∆). Then, there exists a constant δ > 0 such that, as n → ∞, the following holds with probability 1 − o(1) over the choice of a random ∆-regular graph G = (V, E) with n vertices.
There is a polynomial-time algorithm which, on input the graph G, outputs a random set S ⊆ E whose distribution ν S is within total variation distance O(1/n δ ) from the RC distribution ϕ G with parameters p, q, i.e., ν S − ϕ G TV = O(1/n δ ).
We remark that a simple implementation of the algorithm in Theorem 6 runs in time O * (n 6/5 ), see Figure 1 for details. The constant δ > 0 that controls the error of the algorithm depends on p, q, ∆ and gets smaller as p approaches p c (q, ∆).
For integer values of q, Theorem 6 combined with the translation between the random-cluster and Potts models (cf. Lemma 2) yields a sampling algorithm for the ferromagnetic q-state Potts model on random regular graphs. Since uniqueness for the ferromagnetic Potts model holds iff B < B c (q, ∆) and p c (q, ∆) = 1 − 1 Bc(q,∆) , we therefore have the following corollary of Theorem 6.
Corollary 7. Let ∆ ≥ 3, q ≥ 3 and B > 1 be in the uniqueness regime of the (∆ − 1)-ary tree. Then, there exists a constant δ > 0 such that, as n → ∞, the following holds with probability 1−o(1) over the choice of a random ∆-regular graph G = (V, E) with n vertices.
There is a polynomial-time algorithm which, on input the graph G, outputs a random assignment σ : V → [q] whose distribution ν σ is within total variation distance O(1/n δ ) from the Potts distribution µ G with parameter B, i.e., ν σ − µ G TV = O(1/n δ ).

Sampling antiferro Potts on random ∆-regular graphs
The algorithm of Corollary 7 for the ferromagnetic Potts model does not extend to the antiferromagnetic case since there is no analogous connection with the random-cluster model in this case. Nevertheless, we are able to design a sampling algorithm on random regular graphs when the parameter B is in uniqueness via a far more elaborate approach which consists of recolouring (large) bichromatic colour classes.
There is a polynomial-time algorithm which, on input the graph G, outputs a random assignment σ : V → [q] whose distribution ν σ is within total variation distance O(1/n δ ) from the Potts distribution µ G with parameter B, i.e., Note that the sampling algorithm in the antiferromagnetic case works throughout uniqueness apart from the point (∆ − q)/∆, where uniqueness on the tree is expected to hold but the model is conjectured to be at criticality. In particular, for all B = (∆ − q)/∆ which are in the uniqueness regime of T ∆ , it can be shown that the decay on the tree is exponentially small in its height. In contrast, even if uniqueness on the tree holds for B = (∆ − q)/∆, it can be shown that the decay on the tree is only polynomial in its height.
We remark here that the algorithm for the antiferromagnetic case uses as a black-box a subroutine for sampling from the antiferromagnetic Ising model. The running time of this subroutine, which is based on correlation decay methods, is n c for some constant c = c(q, B, ∆) > 0; it is an open question whether there is a faster algorithm for the Ising model. Finally, the constant δ that controls the error of the algorithm depends on B, q, ∆ and gets smaller as B decreases.

Proof Approach
In this section, we outline the main idea behind the algorithms of Theorems 6 and 8, and the key obstacles that we have to address. We focus on the antiferromagnetic Potts model where the details are much more complex and discuss how we get the simplification for the ferromagnetic case via the random-cluster model later.
Definition 9. For an n-vertex graph G = (V, E) with maximum degree ∆, a cycle is short if its length is at most 1 5 log ∆−1 n, and is long otherwise.
Let G be a random ∆-regular graph with n vertices. Following the approach of Efthymiou [8], our algorithm starts from the subgraph of G consisting of all short cycles, which we denote by G ′ . It is fairly standard to show that, with probability 1 − o(1) over the choice of G, the subgraph G ′ is a disjoint union of short cycles, see Lemma 12. It is therefore possible to sample a configuration σ ′ on G ′ which is distributed according to the Potts distribution µ G ′ (exactly). This can be accomplished in several ways; in fact, since the cycles are disjoint and each cycle has logarithmic length, this initial sampling step can even be done via brute force in polynomial time (though it is not hard to come up with much faster algorithms).
After this initial preprocessing, the algorithm then proceeds by adding sequentially the edges that do not belong to short cycles. At each step, the current configuration is updated with the aim to preserve its distribution close to the Potts distribution of the new graph (with the edge that we just added). Key to this update procedure is a resampling step which is performed only when the endpoints of a newly added edge {u, v} happen to have the same colours under the current configuration; intuitively, some action is required in this case because the weight of the current configuration reduces by a factor of B < 1 in the new graph (because of the added edge). The resampling step consists of recolouring a bichromatic class, where the latter is defined as follows.
Definition 10. Let G = (V, E) be a graph and σ : V → [q] be a configuration. For colours c 1 , c 2 ∈ [q], let σ −1 (c 1 , c 2 ) be the set of vertices that have either colour c 1 or colour c 2 under σ.
For distinct colours c 1 , c 2 ∈ [q], we say that U = σ −1 (c 1 , c 2 ) is the (c 1 , c 2 )-colour-class of σ and that U is a bichromatic class under σ. We refer to a connected component of G[U ] as a bichromatic component.
In the proper colourings case (B = 0), Efthymiou [8] demonstrated that the resampling step when adding an edge e = {u, v} can be done by just flipping the colours of a bichromatic component chosen uniformly at random among those containing one of the vertices u and v (say u). The rough idea there is that, when the colourings model is in uniqueness, the bichromatic components on a random graph are typically small in size. At the same time, by the initial preprocessing step, the edge e = {u, v} does not belong to a short cycle and therefore u and v are far away in the graph without e. Hence, u and v are unlikely to belong to the same bichromatic component and the flipping step will succeed in giving u and v different colours with good probability.
Unfortunately, this flipping method does not work for the antiferromagnetic Potts model. It turns out that when q < ∆ and even when the Potts model is in uniqueness, bichromatic components can be large and therefore u and v potentially belong to the same bichromatic component. To make matters worse, these bichromatic components can be quite complicated (with many short/long 6 cycles). This necessitates a more elaborate approach in our setting to succeed in giving u and v different colours without introducing significant bias to the sampler.
The key to overcoming these obstacles lies in the observation that the assignment of the two colours in a bichromatic component follows the Ising distribution, see Observation 22 for the precise formulation. Hence we can hope to use an approximate sampling algorithm for the Ising model in the resampling step. The natural implementation of this idea however fails: known algorithms for the antiferromagnetic Ising model, based on correlation decay, work as long as B > ∆−2 ∆ , where ∆ is the maximum degree of the graph [30,35]. In general, this inequality is not satisfied for us, i.e, there exist B in the uniqueness regime for Potts such that B < ∆−2 ∆ . Fortunately, we can employ fairly recent technology for two-state models [27,31,29] which demonstrates that the graph parameter that matters is not actually the maximum degree of the graph but rather the "average growth" of the graph. While we cannot apply any of the existing results in the literature directly, adapting these ideas to the antiferromagnetic Ising model is fairly straightforward, using results from Mossel and Sly [27]. The more difficult part in our setting is proving that the average growth of the bichromatic components that we consider for resampling is indeed small for "typical" configurations σ (note that in the worst case, the whole graph can be a bichromatic class which has large average growth for our purposes, so a probability estimate over σ is indeed due). Let us first formalise the notion of average growth that we use.
Definition 11. Let M, b be positive constants and G = (V, E) be a graph with n vertices. We say that G has average growth b up to depth L = ⌈M log n⌉ if for all vertices v ∈ V the total number of paths with L vertices starting from v is less than b L .
The notion of average growth is similar to the notion of connective constant for finite graphs used in [31,29], the reason for the slightly different definition is that we will need an explicit handle on the constant M controlling the depth. Note that, since we only consider paths with a fixed logarithmic length, this places a lower bound on the accuracy of the sampling algorithm. Nevertheless, by choosing the constant M sufficiently large, this will still be sufficient to make the error of our sampler polynomially small. In particular, as long as the inequality b 1−B 1+B < 1 is satisfied, using results from [27], we obtain an approximate sampler for the antiferromagnetic Ising model with parameter B on graphs of average growth b up to depth L = ⌈M log n⌉, see Theorem 24 for details.
We give a few more technical details on how we bound the average growth of bichromatic classes. Here, we utilise the tree uniqueness and the tree-like structure of random ∆-regular graphs (cf. Lemma 15) to provide an upper bound on the number of bichromatic paths. For paths of logarithmic length L, we show in Lemma 27 that the probability that a path is bichromatic is ≤ K L , where K is roughly (1 + B)/(B + q − 1). Since there are at most ∆(∆ − 1) L−2 paths with L vertices, we therefore obtain that the average growth b of bichromatic components is bounded above by (∆ − 1)K. When B is in uniqueness, we have that B > (∆ − q)/∆, and therefore the inequality b 1−B 1+B < 1 that is required for the Ising sampler is satisfied. The final technical piece is to bound the error that is introduced by the resampling steps. The placement of a new edge {u, v} reweights the probability that u and v have different colours and introduces an error in our sampling algorithm that is captured by the correlation between the colours of u and v (see Lemma 30). The main idea at this point is that, in the graph without the edge {u, v}, u and v are far apart (since the edge {u, v} does not belong to a short cycle of G) which can be used to show that the correlation between the colours of u and v is relatively small. In Lemma 25, we show that the correlation between u and v can in fact be upper bounded as a weighted sum over paths connecting u and v. This allows us to bound the aggregate sampling error of the algorithm as a weighted sum over short cycles of the random graph G, which can in turn be bounded using a simple expectation argument.
The algorithm that we described for the antiferromagnetic Potts model can actually be adapted to the ferromagnetic case as well. However, as mentioned earlier, we follow a different (and surprisingly simpler) route using the random-cluster representation of the model. At a very rough level, the reason behind the simplification is that the components in the random-cluster model provide a much better grip on capturing the properties of the Potts distribution than the bichromatic-component proxy we used earlier. Indeed, just as we described in the antiferromagnetic case, bichromatic components for the ferromagnetic Potts model can also be linear-sized. However, once we translate the Potts configuration to its random-cluster representation (cf. Lemma 2), the components in the latter are small in size (when the model is in the uniqueness region p < p c (q, ∆)) and therefore vertices that are far away do not belong to the same component. This allows us to perform the resampling step in the random-cluster model by a simple percolation procedure. The details can be found in Section 5.

Organisation
In Section 4, we give the properties of random regular graphs that we are going to use in the analysis of our sampling algorithms. In Section 5, we give the algorithm for the random-cluster model and conclude Theorem 6 (assuming the upcoming Lemma 18). In Section 6, we give the algorithm for the antiferromagnetic Potts model and conclude Theorem 8 (assuming the upcoming Theorem 24 and Lemmas 25 and 27). In Sections 7 and 8, we analyse the random-cluster and antiferromagnetic Potts models on "tree-like" graphs and give the proofs of Lemmas 18 and 27. Finally, in Section 9, we prove Theorem 24 and Lemma 25 which are about correlation decay and sampling for the antiferromagnetic Ising model on graphs of small average growth.

Properties of random regular graphs
In this section, we state and prove structural properties of random ∆-regular graphs which ensure that our algorithms for the random-cluster and Potts models have the desired accuracy (cf. Remark 1). While the exact statements of the properties that we need do not seem to appear in the literature, their proofs follow fairly standard techniques in the area.
We will work in the configuration model, see [18,Chapter 9] for more details. Precisely, for ∆n even, let G := G n,∆ denote the uniform distribution on ∆-regular graphs which is obtained by taking a perfect matching of the set [n] × [∆] and collapsing for each u ∈ [n] the elements (u, 1), . . . , (u, ∆) into a single vertex u; the elements of the set [n]×[∆] are called points. Technically, the distribution G is supported on multigraphs but it can be shown that the probability that G ∼ G is simple is asymptotically a positive constant as n → ∞; conditioned on that event, G is uniformly distributed over ∆-regular graphs with n vertices, and therefore any event that holds with probability 1 − o(1) in G n,∆ also holds with probability 1 − o(1) over the uniform distribution on ∆-regular graphs with n vertices.
The following lemma guarantees that short cycles are disjoint in a random ∆-regular graph.
Lemma 12. Let ∆ ≥ 3 be an integer. Then, with probability 1− o(1) over the choice of a uniformly random ∆-regular graph with n vertices, any two distinct cycles of length ≤ 1 5 log ∆−1 n are disjoint, i.e., they do not share any common vertices or edges.
Proof. For convenience, let ℓ := 1 5 log ∆−1 n and G := G n,∆ . Let G ∼ G and E be the event that G contains two distinct cycles of length ≤ ℓ which are not disjoint. Let also F be the event that, for some integer k ∈ [1, 2ℓ], G contains a subgraph with k vertices and k + 1 edges such that each vertex has at least two incident multiedges (in the subgraph); we call such a subgraph bad. When E occurs, we obtain that F also occurs, so it suffices to upper bound the probability of the latter.
The following lemma guarantees that certain weighted sums over cycles are small; this bound will be used to show that the aggregate error of our samplers is small (cf. Section 3).
There are ℓ! n ℓ ways to choose and order u 1 , . . . , u ℓ and (∆(∆ − 1)) ℓ ways to choose i 1 , . . . , i 2ℓ for a total of ℓ! n ℓ (∆(∆ − 1)) ℓ possible tuples; this overcounts the number of tuples corresponding to distinct cycles by a factor of 2ℓ (the number of ways to root and orient the 2ℓ-tuple). Now, the pairing corresponding to a tuple occurs with probability 1 (∆n−1)(∆n−3)···(∆n−(2ℓ−1)) . Since 2(∆n − 1)(∆n − 3) · · · (∆n − (2ℓ − 1)) ≥ ∆ ℓ n(n − 1) · · · (n − ℓ + 1) for all ℓ ∈ [1, n] and ∆ ≥ 3, it follows that E G [C ℓ ] ≤ (∆ − 1) ℓ /ℓ and hence which proves (4) and therefore concludes the proof. Our next lemma captures the tree-like structure of random ∆-regular graphs that will be relevant for us. In particular, we give a description of the neighbourhood structure around a path. To do this accurately, we will need a few definitions. Let G = (V, E) be a graph. For a vertex v ∈ V and integer h ≥ 0, we denote by Γ h (G, v) the set of vertices at distance ≤ h from v. Definition 14. Let G be a graph and P be a path in G with vertices u 1 , . . . , u ℓ . Let G\P be the graph obtained from G by removing the edges of the path P . Then, for an integer h ≥ 0, the h-graphneighbourhood of the path P is the subgraph of G\P induced by the vertex set i∈[ℓ] Γ h (G\P, u i ).
A connected component of the h-graph-neighbourhood will be called isolated if it contains exactly one of the vertices u 1 , . . . , u ℓ .
Then, for any constant integer h ≥ 0 and any ǫ > 0, there exists a constant ℓ 1 > 0 such that the following holds.
We will use the following version of the well-known Chernoff/Hoeffding inequality.
Lemma 16 (see, e.g., [10, Theorem 21.6 & Corollary 21.9]). Suppose that S n = X 1 + · · · + X n , where {X i } i∈[n] is a collection of independent random variables such that 0 ≤ X i ≤ 1 and E[X i ] = µ i for i = 1, . . . , n. Let µ = µ 1 + · · · + µ n . Then, for any c > 1, Proof of Lemma 15. Fix an arbitrary integer h ≥ 0 and constant ǫ > 0. We will show that the lemma holds with ℓ 1 := 200/ǫ. In the configuration model, a path P with ℓ vertices corresponds to an ordered 2(ℓ − 1)-tuple of points (u 1 , . Fix any such path P with ℓ vertices u 1 , . . . , u ℓ and condition on the event that P appears in G. We will next reveal the h-graph-neighbourhood of P in a breadth-first search manner, as follows.
Order the vertices in U t in lexicographic order.
Pick the k-th vertex in U t , say u (if k > |U t |, set u = 0). 6.
If u = 0 and the point (u, i) is not already paired, 8.
Pair (u, i) with a point not already paired, say (v, i ′ ), selected uniformly at random. 9. If By induction, we have that, for all t ≥ 0, U t consists of the set of vertices at distance t from a vertex in {u 1 , . . . , u ℓ }. Note also that |U t | ≤ ℓ∆ t . Now, fix arbitrary t ∈ {0, 1, . . . , h}, k ∈ [ℓ∆ t ] and i ∈ [∆]. Let F t,k,i be the pairings that we have revealed about the graph G just before executing lines 7-9. Similarly, let S t,k,i be the set of vertices we have encountered just before executing lines 7-9 (i.e., the union of U 1 , . . . , U t together with the current set U t+1 ). Let also Q t,k,i be the event that in lines 7-8 all of the following happen: (i) in line 7, u = 0 and the point (u, i) is not paired, and (ii) in line 8, (u, i) gets paired to a point in S t,k,i × [∆]. There are at least ∆(n − |S t,k,i |) points that have not been paired, so where in the last inequality we used that for all sufficiently large n (using that ℓ 1 ≤ ℓ ≤ n 9/10 ). Using (5), we obtain that the number of events {Q t,k,i } that occur is dominated above by a binomial r.v. X ℓ ∼ Bin(ℓ∆ h+2 , 2ℓ∆ h+2 n ). By Lemma 16, we have that Since ℓ ≤ n 9/10 , we have c ℓ ≥ en 1/20 for all sufficiently large n and hence It follows that for any path P with ℓ vertices, with probability ≥ 1 − e − 1 40 ǫℓ log n , at most ǫℓ/2 of the events {Q t,k,i } occur, i.e., the h-graph-neighbourhood of P contains at least (1 − ǫ)ℓ isolated tree components (every event Q t,k,i that occurs decreases the number of isolated tree components by at most two -on the other hand, if Q t,i,k does not occur then the number of isolated tree components stays the same). Since there are at most n∆ ℓ paths with ℓ vertices, we obtain by a union bound that the probability that there exists a path whose h-graph-neighbourhood contains less than (1 − ǫ)ℓ isolated tree components is upper bounded by where the last bound follows by observing that, for all sufficiently large n, the summands are decreasing functions of ℓ and that for ℓ = ⌈200/ǫ⌉ we have e log n+ℓ log ∆− 1 40 ǫℓ log n = O(1/n 3 ). This concludes the proof of Lemma 15.
To conclude this section, we clarify a small point relevant to Remark 1. We will only utilise Lemma 15 for paths of logarithmic length (despite that the lemma is stated for convenience for much longer paths) and therefore the property can be checked in polynomial time. Analogously, the sum in Lemma 13 will only be considered for cycles of logarithmic length and therefore the (restricted) inequality can also be checked in polynomial time.
Algorithm SampleRC(G) parameters: reals p ∈ (0, 1) and q ≥ 1 Output the resulting set S ⊆ E. Figure 1: Algorithm for sampling a random-cluster configuration. Note that, since G ′ is a disjoint union of short cycles, the inital configuration S ′ in the algorithm above can be obtained quickly in various ways (e.g., even brute force takes time O * (n 6/5 ) since each cycle has length ≤ 1 5 log ∆−1 n and there are at most n cycles).

The Algorithm
To prove Theorem 6, we will consider a simple percolation algorithm for sampling a random-cluster configuration on a random ∆-regular graph G. The algorithm is given in Figure 1 and in Section 5.3 we will detail its performance when the input is a random regular graph.
Prior to that, let us first motivate the algorithm SampleRC, by demonstrating how to update an RC configuration when we add a single edge {u, v}. To control the effect of adding an edge, it will be relevant to consider the event that there is an open path between u and v (for a path P in G and an RC configuration S ⊆ E, we say that P is open in S if all of its edges belong to S); we denote this event by u ↔ v.
Let G = (V, E) be a graph and u, v be two vertices such that {u, v} ∈ E and ϕ G (u ↔ v) ≤ ǫ. Consider the graph G ′ = (V, E ′ ) obtained from G by adding the edge {u, v}. Sample a random subset of edges Y ⊆ E ′ as follows: first, sample a subset of edges X ⊆ E according to the RC distribution ϕ G and, then, set Y = X ∪ {e} with probability p/(p + (1 − p)q), and Y = X otherwise.
Then, the distribution of Y , denoted by ν Y , is within total variation distance 2qǫ from the RC distribution ϕ G ′ on G ′ with parameters p, q, i.e., Proof. Fix an arbitrary ǫ ∈ (0, 1/q).
Let Ω op be the set of subsets S ⊆ E such that in the graph (V, S), u and v are connected by an open path and let Ω cl = 2 E \Ω op . For S ⊆ E, denote for convenience by S e the set S ∪ {e}. Observe that ∀S ∈ Ω cl : Note also that for any S ⊆ E, we have Using (6), (7) and the assumption ϕ G (u ↔ v) ≤ ǫ, we will also show the following for the partition functions Z G , Z ′ G : Let us conclude the proof assuming, for now, (8).
For S ⊆ E, we have and therefore we can bound the first sum in (9) as To bound the second sum in (9), we consider whether S ∈ Ω cl or S ∈ Ω op . For S ∈ Ω cl , we have where the last inequality follows from |qM Plugging (10) and (11) in (9), we obtain that ν Y − ϕ G ′ TV ≤ 2qǫ as wanted.

Aggregating the error
To utilise Lemma 17, we need to upper bound the probability that two vertices belong to the same component in a RC configuration. In turn, it suffices to bound the probability that there is an open path between the vertices. To this end, we utilise the fact that the parameters p, q are in the uniqueness region of the (∆−1)-ary tree and the tree-like structure around paths (cf. Definition 14) to show the following. The proof of the lemma is given in Section 7.
Lemma 18. Let ∆ ≥ 3 be an integer, q ≥ 1 and p < p c (q, ∆). There exist constants K < 1/(∆ − 1) and ǫ > 0 such that the following holds for all sufficiently large integers ℓ and h. Let G be a ∆-regular graph and P be a path with ℓ vertices whose h-graph-neighbourhood contains (1 − ǫ)ℓ isolated tree components. Let ϕ G be the RC distribution on G with parameters p, q. Then, Using monotonicity properties of the RC distribution, we can extend Lemma 18 to arbitrary subgraphs of a target graph G. In particular, suppose that G, P are as in Lemma 18 and that G ′ is a subgraph of G which contains the path P . Then it also holds that ϕ G ′ (P is open) ≤ K ℓ . We will not define the notion of monotonic distributions in its full generality, but instead we will just state the following property of RC distributions which will be sufficient for our purposes, see [ Lemma 19 (see, e.g., [15,Chapter 2]). Let G = (V, E) be a graph and consider the RC distribution on G with parameters p ∈ (0, 1) and q ≥ 1. Then for any subsets S, S ′ ⊆ E such that S ⊆ S ′ , it holds that for any increasing event F. 2 Combining Lemmas 13, 15 and 18, we can now conclude the following.
Lemma 20. Let ∆ ≥ 3 be an integer, q ≥ 1 and p < p c (q, ∆). Then, there exists a constant δ > 0 such that, as n → ∞, the following holds with probability 1 − o(1) over the choice of a uniformly random ∆-regular graph G = (V, E) with n vertices.
We will show that for any ∆-regular graph G which satisfies Items 1 and 2, it holds that where e 1 = {u 1 , v 1 }, . . . , e t = {u t , v t } are the edges of G that do not belong to short cycles (i.e., cycles of length ≤ ℓ 0 log n), G j is the subgraph G\{e 1 , . . . , e j } and ϕ G j is the RC distribution on G j with parameters p, q. Decreasing the value of δ does not affect the validity of Item 1, and hence we will assume that δ ∈ (0, 1). For j ∈ [t], consider the edge e j = {u j , v j } and let P ℓ,j denote the number of paths with ℓ vertices in G whose endpoints are u j and v j . Using the fact that G satisfies Item 2, we will show shortly that, for all j ∈ [t], it holds that Let us assume (13) for now, and conclude the proof of (12). Summing (13) over j ∈ [t] (and using the trivial bound t ≤ |E| ≤ ∆n/2), we obtain that where in the last inequality we used that t j=1 P ℓ,j ≤ ℓC ℓ which follows from the observation that every path with ℓ vertices connecting the endpoints of an edge {u j , v j } maps to a cycle of length ℓ (by adding the edge {u j , v j }) and each cycle of length ℓ can potentially arise at most ℓ times under this mapping. Using (14) and the fact that G satisfies Item 1, we obtain (12), as wanted.
To finish the proof, it only remains to prove (13). Since G satisfies Item 2 and W = 1/K, by Lemma 18, we have that for any path P of length ℓ ∈ [L 0 , L 2 + 1] connecting u j , v j , it holds that Since G j is a subgraph of G, any path in G j that connects u j and v j is also present in G. Moreover, we have that ϕ G j is obtained by conditioning some edges of G to be closed (namely, e 1 , . . . , e j ). Therefore, by Lemma 19, we conclude from (15) that Since the edge {u j , v j } does not belong to a short cycle, we have that any path P in G j connecting u j and v j has length at least ℓ 0 log n. We can therefore bound the probability of an open path between u j and v j by where E j is the event that there exists an open path P with ℓ vertices with L 0 ≤ ℓ ≤ L 2 connecting u j and v j , whereas F j is the event that there exists an open path P with ℓ = L 2 +1 vertices starting from u j (the other endpoint can be v j or any other vertex of the graph). Using (16), we have by a union bound over paths that where in the bound for ϕ G j (E j ) we used that there are P ℓ,j paths with ℓ vertices connecting u j and v j , while in the bound for ϕ G j (F j ) we used that there are at most ∆(∆ − 1) L 2 −1 paths in G with L 2 + 1 vertices starting with u j (since G has max degree ∆), the trivial inequalities ℓ 2 log n − 1 ≤ L 2 ≤ ℓ 2 log n, and the choice of ℓ 2 which guarantees that ℓ 2 ≥ 4 log(W/(∆ − 1)).

Combining the pieces -Proof of Theorem 6
We are now able to prove the following theorem, which details the performance of the algorithm SampleRC on random ∆-regular graphs and yields as an immediate corollary Theorem 6.
Theorem 21. Let ∆ ≥ 3, q ≥ 1 and p < p c (q, ∆). Then, there exists a constant δ > 0 such that, as n → ∞, the following holds with probability 1 − o(1) over the choice of a random ∆-regular graph G = (V, E) with n vertices. The output of Algorithm SampleRC(G) (cf. Figure 1) is a set S ⊆ E whose distribution ν S is within total variation distance O(1/n δ ) from the RC distribution ϕ G with parameters p, q, i.e., Proof. By Lemmas 12 and 20, we have by a union bound that a uniformly random ∆-regular graph G with n vertices satisfies the following with probability 1 − o(1): 1. any two distinct cycles of length ≤ 1 5 log ∆−1 n are disjoint, . , e t = {u t , v t } are the edges of G that do not belong to short cycles, and G j is the subgraph G\{e 1 , . . . , e j } (for j ∈ [t]).
We will show that for any graph G = (V, E) that satisfies Items 1 and 2, the output of the algorithm SampleRC is a random set S ⊆ E whose distribution ν S is within total variation distance 1/n δ from the RC distribution ϕ G with parameters p, q, therefore proving the result.
be the sequence of subgraphs as in Item 2 above and, for convenience, , G t is the subgraph of G where only the edges that belong to short cycles appear. By Item 1, we have that G t consists of isolated vertices and disjoint cycles and hence we can conclude that the output of the algorithm SampleRC is not Fail, i.e., on input a graph G satifying Items 1 and 2, SampleRC outputs a random set S ⊆ E. It therefore remains to show that the distribution ν S of S satisfies For For j ∈ [t], we have that S j−1 is obtained from S j by adding the edge e j with probability p/(q + (1 − p)q). LetŜ j−1 ⊆ E j−1 be a subset of edges obtained by sampling an RC configuration from G j (according to ϕ G j ) and adding the edge e j with probability p/(q + (1 − p)q); denote by νŜ j−1 the distribution ofŜ j−1 . By Lemma 17 we have that 3 Moreover, since in each of S j−1 andŜ j−1 the edge e j appears independently with the same probability p/(q + (1 − p)q), we have that Using the triangle inequality and induction, we obtain that for all j = 0, 1, . . . , t it holds that Writing this out for j = 0 and using (20), we obtain that where in the last inequality we used that the graph G satisfies Item 2. This finishes the proof of (19) and therefore the proof of Theorem 21 as well.

Algorithm for the antiferromagnetic Potts model
In this section, we give the details of our sampling algorithm for the antiferromagnetic Potts model (outlined in Section 3). The section is organised as follows. First, in Section 6.1, we formalise the connection between the Potts model on bichromatic classes and the Ising model. Then, in Section 6.2, we state the sampling algorithm for the Ising model on graphs with small average growth that we are going to use for resampling bichromatic classes in the Potts model; moreover, we state certain correlation decay properties for the Ising model that will be relevant for analysing the error of our Potts sampler. In Section 6.3, we state the key lemma that allows us to bound the average growth of bichromatic classes in the Potts model on random regular graphs. In Section 6.4, we show an "idealised" subroutine that updates a Potts configuration when we add a new edge {u, v}; the subroutine works by resampling an appropriately chosen bichromatic class and it is "idealised" in the sense that it assumes that certain steps can be carried out efficiently. In Section 6.5, we modify the subroutine to make it computationally efficient by considering the average growth of bichromatic classes that get resampled; there, we give the complete description of the actual resampling subroutine used in our Potts sampler. With these pieces in place, we are in position to complete the description and analysis of the Potts sampler in Section 6.6.

Connection between Potts on bichromatic classes and the Ising model
In this section, we describe the connection between the Potts model on bichromatic classes and the Ising model. Recall that the Ising model is the special case q = 2 of the Potts model; to distinguish between the models, we will use π G to denote the Ising distribution on G with parameter B. Sometimes, we will need to replace the binary set of states {1, 2} in the Ising model by other binary sets to facilitate the arguments; we use π c 1 ,c 2 G to denote the Ising distribution with binary set of states {c 1 , c 2 } (we will have that c 1 , c 2 ∈ [q]).
For a configuration σ : V → [q], we will denote by σ U the restriction of σ to the set U . Our sampling algorithm is based on the following simple observation.
i.e., conditioned on U being the (c, c ′ )-colour-class in the Potts distribution µ G , the marginal distribution on U is the Ising distribution π G[U ] (with set of states {c, c ′ }).
The following definition will be notationally convenient.
Definition 23. Let G be a graph, u, v be vertices in G and c, c ′ be distinct colours in [q]. We write π c,c ′ G,u,v to denote the Ising distribution on G (with set of states {c, c ′ }) conditioned on u taking the state c and v the state c ′ .

Sampling the Ising model on graphs with small average growth
Recall from Section 3 that our algorithm for sampling the antiferromagnetic Potts model with parameter B will use as a subroutine a sampling algorithm for the Ising model with parameter B to recolour bichromatic classes. In general, these classes may consist of large bichromatic components (with a linear number of vertices), so to carry out this subroutine efficiently, we need to use an approximate sampling algorithm; our leverage point will be that we can bound the average growth of bichromatic components, cf. Definition 11. Adapting results of [27], we show the following in Section 9.2.
Theorem 24. Let B ∈ (0, 1) and b > 0 be constants such that b 1−B 1+B < 1, and let ∆ ≥ 3 be an integer. Then, there exists M 0 > 0 such that the following holds for all M > M 0 .
There is a polynomial-time algorithm that, on input an n-vertex graph G with maximum degree at most ∆ and average growth b up to depth L = ⌈M log n⌉, outputs a configuration τ : V → {1, 2} whose distribution ν τ is within total variation distance 1/n 10 from the Ising distribution on G with parameter B, i.e., ν τ − π G TV ≤ 1/n 10 .
Moreover, the algorithm, when given as additional input two vertices u and v in G, outputs a configuration τ : V → {1, 2} such that τ u = 1 and τ v = 2, and whose distribution ν τ satisfies where π 1,2 G,u,v is the Ising distribution on G conditioned on u having state 1 and v having state 2. In addition, we will use the following spatial mixing result to analyse the accuracy of our algorithm for the antiferromagnetic Potts model. The proof is given in Section 9.3.
is the number of paths with ℓ vertices in G that connect u and v.
We will also need the following crude bound.
Lemma 26. Let B ∈ (0, 1) and ∆ ≥ 3 be an integer. Suppose that G is a graph of maximum degree at most ∆ and let u be a vertex and Λ be a set of vertices in G such that u ∈ Λ. Then, for every configuration τ : Λ → {1, 2} and s ∈ {1, 2} Proof. Without loss of generality, we assume that s = 1. Let D be the number of neighbours of u and u 1 , . . . , u d be the neighbours of u in G \ Λ and note that d ≤ D ≤ ∆. Let d 0 be the number of v ∈ Λ such that v is u's neighbour and τ v = 1. Let s 1 , . . . , s d ∈ {1, 2} be arbitrary and let d 1 be the number of the s i 's that are equal to 1. Then we have that Since u 1 , . . . , u d ∈ Λ we have that π G (σ u 1 = s 1 , . . . , σ u d = s d | σ Λ = τ ) = 0 for any choice of s 1 , . . . s d and therefore by the law of total probability we obtain min d 2 ∈{d 0 ,...,d 0 +d} ∈ (0, 1)) and hence min d 2 ∈{d 0 ,...,d 0 +d} Using that B ∈ (0, 1) and D ≤ ∆, we obtain the inequalities in the statement of the lemma.

Average growth of bichromatic components in the Potts distribution
To utilise Theorem 24 and Lemma 25 for our sampling algorithms, we will need to bound the average growth of bichromatic components in a typical Potts configuration on a random regular graph. Our key lemma to achieve this will bound the probability that a path is bichromatic 4 in uniqueness, provided that the local neighbourhood around the path (in the sense of Definition 14) has a tree-like structure. The following lemma quantifies this probability bound and is proved in Section 8.2. The proof uses the fact that the parameter B lies in the uniqueness regime of the (∆ − 1)-ary tree.
Let G be a graph of maximum degree ∆ and P be a path with ℓ vertices whose h-graphneighbourhood contains (1 − ǫ)ℓ isolated tree components. Let µ G be the Potts measure on G with parameter B. Then, For a random ∆-regular graph G, paths do have the tree-like structure of Lemma 27 (cf. Lemma 15), and hence we can aggregate over all paths emanating from an arbitrary vertex (roughly (∆ − 1) ℓ of them) and get a bound of roughly (∆ − 1)K < (∆−1)(1+B) B+q−1 for the average growth of bichromatic components in a typical configuration σ. This will allow us to use the upcoming ReSample subroutine for updating a bichromatic class using the Ising sampler of Section 6.2.

Analysing an Ideal ReSample subroutine
In this section, we give a preliminary description and analysis of the ReSample subroutine that updates a Potts configuration when we add a new edge {u, v}. The ReSample subroutine is inspired by the approach of Efthymiou [8] for colourings.
In fact, for the moment, we will only study an "idealised" version of ReSample which we call IdealReSample; the subroutine is idealised in the sense that it assumes that certain steps can be carried out efficiently. Later, we will modify the subroutine to obtain the actual ReSample subroutine whose running time will be polynomial with respect to the size of the input graph.
The point of analysing first IdealReSample is to give the key ideas behind the underlying resampling step without bothering for the moment to make the subroutine computationally efficient. Moreover, the detour is going to be smaller than it might appear since the analysis of the actual ReSample subroutine will follow from the analysis of IdealReSample.
The IdealReSample subroutine takes as inputs a graph G, two vertices u and v of G and a configuration σ on G such that σ u = σ v ; it outputs a configuration σ ′ on G by updating the configuration on an appropriately chosen bichromatic class containing the vertices u and v. The details of the subroutine can be found in Figure 2. The subroutine will be used to update a Potts configuration when we add a new edge {u, v}, see the upcoming Lemma 30.
Algorithm IdealReSample(G, u, v, σ) parameters: real B ∈ (0, 1), integer q ≥ 3 Sample Ising configuration τ on H conditioned on τ u = c and τ v = c ′ , more precisely: The IdealReSample subroutine; we will later modify this to obtain the actual ReSample subroutine used in Algorithm SampleAntiPotts (cf. Figure 4).
To control the output distribution of the IdealReSample subroutine, the following definition will be crucial.
Definition 28. Let B > 0. Suppose that G = (V, E) is a graph and that u, v are vertices in G. For a set U ⊆ V such that u, v ∈ U , let To state the main lemma of this section, we will also need the following definition of a "random bichromatic class" containing two specific vertices u and v under a configuration σ.
Definition 29. Let G = (V, E) be a graph, u, v be vertices in G and σ : V → [q] be a configuration on G. We let U σ ⊆ V be a bichromatic class under σ which contains u and v, chosen uniformly at random among the set of all such classes if there is more than one.
More precisely, if σ u = σ v , then U σ is the (c 1 , c 2 )-colour-class in σ where c 1 , c 2 are the colours of u and v under σ. If σ u = σ v , then U σ is the (c, c ′ )-colour-class in σ, where c is the common colour of u and v under σ and c ′ is a uniformly random colour from [q]\{c}.
The following lemma will be critical for our Potts sampler. It shows how to update a Potts configuration when we add a new edge {u, v}, based on the IdealReSample subroutine. It also controls the error introduced based on the "average correlation" between u and v in a random bichromatic class that contains them.
Let G = (V, E) be a graph and µ G be the Potts distribution on G with parameter B. Suppose that u, v are vertices in G such that {u, v} ∈ E and where the expectation is over the choice of a random configuration σ ∼ µ G and the choice of a random bichromatic class U σ ⊆ V containing u and v under σ (cf. Definition 29).
Consider the graph G ′ = (V, E ′ ) obtained from G by adding the edge {u, v}. Sample a configuration σ ′ : V → [q] as follows. First, sample σ : V → [q] according to µ G . Then, if σ u = σ v , set σ ′ = σ; otherwise, set σ ′ = IdealReSample(G, u, v, σ). Then, the distribution of σ ′ , denoted by ν σ ′ , is within total variation distance 2ǫ/B from the Potts distribution µ G ′ on G ′ with parameter B, i.e., Proof. We begin with a few definitions that will be used throughout the proof. Fix distinct colours c 1 , c 2 ∈ [q] and a set U ⊆ V such that u, v ∈ U . Let Ω(U, c 1 , c 2 ) be the set of configurations such that U is the (c 1 , c 2 )-colour-class, i.e., We will be interested in two particular types of configurations in Ω(U, c 1 , c 2 ), those where u, v take the colours c 1 , c 2 and those where u, v take the colours c 1 , c 1 . Namely, let Note the asymmetry in the above definitions with respect to c 1 , c 2 , e.g., for η ∈ Ω eq (U, c 1 , c 2 ), we have that η u , η v = c 2 . For a configuration η ∈ Ω(U, c 1 , c 2 ), we also denote by the set of configurations in Ω(U, c 1 , c 2 ) that agree with η on V \U ; we define analogously the sets Ω η neq (U, c 1 , c 2 ), Ω η eq (U, c 1 , c 2 ). Using Observation 22 and the Ising distribution π c 1 ,c 2 G[U ] (cf. Section 6.1), we have that for any η ∈ Ω(U, c 1 , c 2 ) it holds that and Also, the assumption E Corr G (U σ , u, v) ≤ ǫ translates into To see this, for a set U ⊆ V such that u, v ∈ U , we find how much Corr G (U, u, v) contributes to E Corr G (U σ , u, v)]. Note, for a configuration σ to have U σ = U (cf. Definition 29), there must exist distinct colours c 1 , c 2 ∈ [q] such that U = σ −1 (c 1 , c 2 ) and either (i) σ(u) = c 1 , σ(v) = c 2 , or (ii) σ(u) = c 1 , σ(v) = c 1 . In case (i), we have that σ ∈ Ω neq (U, c 1 , c 2 ) and U σ = U with probability 1.
We next proceed to the proof. Recalling that ν σ ′ is the distribution of σ ′ , our goal is to show that Weight of configurations in ν σ ′ . For distinct colours c 1 , c 2 ∈ [q] and a set U ⊆ V such that u, v ∈ U , we first show that (Recall from Definition 23 that π c 1 ,c 2 G[U ],u,v is the conditional Ising distribution on G[U ] where u,v take the colours c 1 , c 2 , respectively.) To see the expression for ν σ ′ (η) in (25), note that we can obtain η ∈ Ω eq (U, c 1 , c 2 ) via the subroutine IdealReSample only if σ = η and the coin flip came up heads (because η u = η v = c 1 ). Analogously, to see the expression for ν σ ′ (η) in (26), note that we can obtain η ∈ Ω neq (U, c 1 , c 2 ) if one of the following happens (using that η u = c 1 and η v = c 2 ).
• We started with the configuration σ = η; this happens with probability µ G (η). Then, we obtain η with probability 1.
All that happens with probability Using the assumption E Corr G (U σ , u, v) ≤ ǫ (cf. (23)), we will later show the following bound for the ratio of the partition functions Z G , Z G ′ : Let us conclude the proof assuming, for now, (27). Then, we will prove (27) in Step II below.
B+q−1 µ G (η) using again the definition of M in (27). Hence, by (26), where the last inequality follows from the triangle inequality and the inequalities 1−B and therefore, using (21) and (22), we have that and hence (31) gives that, for all η ∈ Ω neq (U, c 1 , c 2 ), it holds that Summing (30) and (33) over the relevant configurations η, we obtain that, for all U ⊆ V with u, v ∈ U and distinct colours c 1 , c 2 ∈ [q], it holds that To conclude the proof of (24), note that analogously to (28) we have that Now consider the expression for ν σ ′ − µ G ′ TV from (28). We bound each term D(U, c 1 , c 2 ) using (34) and then apply (23) and (35) to obtain that where the last inequality follows from |M | ≤ ǫ/B (cf. (27)). This finishes the proof of (24), modulo the proof of (27) which is given below.

The ReSample subroutine
In this section, we modify the IdealReSample subroutine of Section 6.4 to make it computationally efficient; this will give us the actual ReSample subroutine that we will use in our sampling algorithm for the Potts model. To describe the ReSample subroutine, we will need the following definition.
Definition 31. Let b, M > 0 be constants. Let G = (V, E) be an n-vertex graph and let σ be a configuration on G. We say that a bichromatic component in σ is (b, M )-good if it has average growth b up to depth L = ⌈M log n⌉; we say that it is (b, M )-bad otherwise. Analogously, we say that σ is (b, M )-good if all bichromatic components in σ are good; otherwise, we say that σ is (b, M )-bad.
We are now able to describe the ReSample subroutine, which takes as inputs a graph G, two vertices u and v of G and a configuration σ on G such that σ(u) = σ(v). The subroutine is given in Figure 3. Use algorithm of Theorem 24 to sample Ising distribution on H, more precisely:  Figure 4).

Analysis of the Potts algorithm
We now have all the pieces to give our algorithm for sampling from the antiferromagnetic Potts model, see the algorithm SampleAntiPotts in Figure 4. We next prove the following theorem, which details the performance of the algorithm SampleAntiPotts on random ∆-regular graphs and yields as an immediate corollary Theorem 8. for j = t downto 1: Obtain the graph G j−1 by adding the edge {u j , v j } in G j end return σ = σ 0 . Figure 4: Algorithm for sampling a Potts configuration in the antiferromagnetic case B ∈ (0, 1). The details of the ReSample subroutine are given in Figure 3. While the algorithm can also be modified to work for the ferromagnetic case B > 1, we will instead use a simpler percolation algorithm via the random-cluster representation.
Theorem 32. Let ∆ ≥ 3, q ≥ 3 and B ∈ (0, 1) be in the uniqueness regime of the (∆ − 1)-ary tree. Then, there exists constants b, M, δ > 0 such that, as n → ∞, the following holds with probability 1 − o(1) over the choice of a random ∆-regular graph G = (V, E) with n vertices. The output of the algorithm SampleAntiPotts(G) (cf. Figure 4) is an assignment σ : V → [q] whose distribution ν σ is within total variation distance O(1/n δ ) from the Potts distribution µ G with parameter B, i.e., Proof. Since B is in the uniqueness regime of the (∆ − 1)-ary tree and B = ∆−q ∆ , we have that B > ∆−q ∆ (cf. Remark 4). It follows that 1−B B+q−1 < 1 ∆−1 and therefore there exists ǫ ′ > 0 such that Let K < 1+B B+q−1 + ǫ ′ and ǫ > 0 be the constants in Lemma 27 corresponding to ǫ ′ , and let h ′ , ℓ ′ be positive constants such that Lemma 27 applies for all integers h ≥ h ′ and ℓ ≥ ℓ ′ . Fix h to be any integer greater than h ′ . Let ℓ 1 > 0 be the constant in Lemma 15 corresponding to the values of ǫ and h. Let δ > 0 be the constant in Lemma 13 corresponding to ℓ 0 := 1/(5 log(∆ − 1)) and W := 1/(K 1−B 1+B ) (note that (43) guarantees that W > ∆ − 1). Let also Let M 0 , M ′ 0 be the constants in Theorem 24 and Lemma 25, respectively. Let M be sufficiently large so that M > max{2ℓ 0 , 2M 0 , 2M ′ 0 } and the following inequalities hold (for all sufficiently large n): Note that such an M exists since (∆ − 1)K < b ′ < b. Finally, set Taking a union bound over Lemmas 13 and 15, we have that, for all sufficiently large n, a uniformly random ∆-regular graph G = (V, E) with n vertices satisfies the following with probability 1 − o(1) over the choice of the graph: where C ℓ is the number of cycles of length ℓ in G.
Fix any ∆-regular graph G which satisfies Items 1 and 2. The theorem will follow by showing that the output of SampleAntiPotts(G) (cf. Figure 4) is an assignment σ : V → [q] whose distribution ν σ is within total variation distance O(1/n δ ) from the Potts measure µ G with parameter B, i.e., To do this, as in the algorithm SampleAntiPotts(G), let e 1 = {u 1 , v 1 }, . . . , e t = {u t , v t } be the edges of G that do not belong to short cycles (i.e., cycles of length ≤ ℓ 0 log n). For j ∈ {0, 1, . . . , t}, let G j be the subgraph G\{e 1 , . . . , e j } and µ j be the Potts distribution on G j with parameter B; note that the graphs G j are defined exactly as in the algorithm SampleAntiPotts(G). We will useσ j to denote a random configuration distributed according to µ j ; note that σ j is used to denote the configuration considered by the algorithm at the beginning of the step j of the algorithm and, as we shall see soon, its distribution is close to that ofσ j on G j .
For an integer ℓ ≥ 1, denote by P ℓ,j the set of paths of length ℓ that connect u j and v j in G j and by P ℓ,j = |P ℓ,j | the number of all such paths. Note that t j=1 P ℓ,j ≤ ℓC ℓ , since every path with ℓ vertices connecting the endpoints of an edge {u j , v j } in G j maps to a cycle with ℓ vertices in the initial graph G (by adding the edge {u j , v j }), and each cycle with ℓ vertices in G can potentially arise at most ℓ times under this mapping. Let also Let also Ω j (b, M ) be the set of all (b, M )-bad configurations on G j . Using the fact that G satisfies Item 2, we will show that for all j = 1, . . . , t it holds that where the expectation in (48) is over the choice of a random configurationσ j distributed according to µ j and over the choice of the random bichromatic class Uσ j containing u j and v j underσ j . We will prove (47) and (48) shortly, but let us assume them for now and conclude the proof of the theorem. We will prove by induction that for all j ∈ {0, 1, . . . , t} it holds that For j = t, the result holds trivially since σ t is distributed as µ Gt (exactly). Assume that (49) holds for j where j ∈ {1, . . . , t}, we will also show that it holds for j − 1. To do this, let us consider the configuration σ ′ j−1 defined as follows: i.e., σ ′ j−1 is obtained using at the j-th step of the algorithm the random configurationσ j distributed according to µ G j (exactly). In contrast, note that Then, by Lemma 30 applied to the graph G j and the inequality in (48) we have that Sinceσ j is distributed according to µ G j , by the Coupling Lemma, there exists a coupling Pr(·) of σ j andσ j such that Note also that, for a configuration η / ∈ Ω j (b, M ), conditioned on σ j = η andσ j = η, we can couple σ j−1 and σ ′ j−1 so that σ j−1 = σ ′ j−1 with probability at most 1/n 10 . To see this, if η(u j ) = η(v j ) then we trivially have σ ′ j−1 = σ j−1 = η. Otherwise, σ j−1 and σ ′ j−1 are produced by first choosing a random bichromatic class containing u j and v j under η, which we can couple so that it is the same in both ReSample b,M (G j , u j , v j , η) and IdealReSample(G j , u j , v j , η). Denote this class by U and let c 1 = η(u j ), c 2 = η(v j ). Then, from the definition of IdealReSample(G j , u j , v j , η) we have that the distribution of σ ′ j−1 (U ) is given by Invoking the Coupling Lemma again, we therefore obtain that where in the last inequality we used (51), (52) and Pr σ j =σ j ∈ Ω j (b, M ) ≤ Pr σ j ∈ Ω j (b, M ) = µ G j (Ω j (b, M )).
We have G 0 = G and σ 0 = σ, so (49) for j = 0 gives where the last inequality holds for all sufficiently large n using the values of ǫ j from (46) (and the crude bound t ≤ |E| ≤ ∆n). Using (45), we therefore have that where the last inequality follows from our assumption that G satisfies Item 1 (and by assuming that n is sufficiently large). This completes the proof of Theorem 32, modulo the proofs of (47) and (48) which are given below.
To prove (47) and (48), fix any value j ∈ {1, . . . , t}. Since G j is a subgraph of G and G satisfies Item 2 every path P in G j with ℓ vertices where ℓ ∈ [ℓ 1 , L] has an h-graph-neighbourhood with at least (1 − ǫ)ℓ isolated tree components. By Lemma 27 which, recall, applies for all ℓ ≥ ℓ ′ (where ℓ ′ is the constant specified in the beginning of the proof), we therefore have that µ G j (P is bichromatic) ≤ K ℓ for any path P in G j with ℓ vertices, ℓ ∈ [max{ℓ 1 , ℓ ′ }, L]. (54) We are now ready to prove (47). For a vertex v, let X v be the r.v. that counts the number of bichromatic paths with L vertices that start from v in a random configuration σ ∼ µ G j . Since G j has maximum degree ≤ ∆, there are at most ∆(∆ − 1) L−2 paths starting from v with L vertices and each of them is bichromatic with probability at most K L from (54). Hence, the expectation of where the last inequality follows from the choice of M (cf. (44)). By Markov's inequality we therefore have that for all v ∈ V it holds that where the last inequality also follows from the choice of M (cf. (44)). Note that for (b, M )-bad configurations, i.e., configurations in Ω j (b, M ), at least one of the events X v > b L occurs for some v ∈ V , and therefore by a union bound over v ∈ V we have that This finishes the proof of (47).
To prove (48), for a set U ⊆ V such that u j , v j ∈ U , let Note, by the symmetry of the states in the Ising model, we have that π G[U ] τ v j = 1 = π G[U ] τ v j = 2 = 1/2 and therefore By Lemma 26 and since G j has maximum degree ≤ ∆, we obtain that and therefore, to prove (48) we only need to show that For convenience, let F j be the event that the Potts configurationσ j is (b, M )-bad, i.e., F j = {σ j ∈ Ω j (b, M )}, and denote by F j the event thatσ j / ∈ Ω j (b, M ).
where the inequality follows from the bounds Corr G j (Uσ j , u j , v j ) ≤ 1 and µ j F j ≤ 1/n 10 from (47).
Recall that P ℓ,j denotes the set of paths of length ℓ that connect u j and v j in G j . Consider also the r.v.
Recall that Uσ j is a bichromatic class underσ j which contains u j and v j (chosen uniformly at random among the set of all such classes if there is more than one). Conditioned on the event σ j / ∈ Ω j (b, M ), we have that every bichromatic component underσ j has average growth b up to depth L = ⌈M log n⌉ and therefore, irrespective of the random choice of Uσ j , we obtain by Lemma 25 that Corr G j (Uσ j , u j , v j ) ≤ Y j . Note, in applying Lemma 25, we used that P ℓ,j = 0 for all ℓ ∈ [1, L 0 ]; this holds because the edge {u j , v j } does not belong to a short cycle in G (and therefore in G j−1 as well). We thus have that where the last inequality follows from 1/µ G j F j ≤ 1/(1 − 1/n 10 ) ≤ 1 + 2/n 10 (cf. (47)). Using (54) and the fact that L 0 ≥ max{ℓ 1 , ℓ ′ } for all sufficiently large n, we have that Since G satisfies Item 1, we have that the sum in (58) is less than 1/n δ and hence (57) and (58) give that for all sufficiently large n it holds that Combining this with (56) yields (55), therefore concluding the proof of (48). This finishes the proof of Theorem 32.

31
7 Analyzing RC on graphs with tree-like structure In this section, we give the proof of Lemma 18. We begin by revisiting the uniqueness results of Häggström [16] on the (∆−1)-ary tree; then, we use these results to obtain Lemma 18 in Section 7.2.

Uniqueness on the (∆ − 1)-ary tree
Fix an integer ∆ ≥ 3. In this section, we review the results of Häggström [16] about uniqueness of random-cluster measures on the infinite (∆ − 1)-ary tree. In fact, it will be more relevant for us to consider the case of finite trees of large depth (rather than the infinite tree itself); this approach has also been followed in [15,Chapter 10] for the case ∆ = 3. As in Section 2.2, let T ∆ denote the infinite (∆ − 1)-ary tree with root vertex ρ. For integer h ≥ 0, let T h = (V h , E h ) denote the subtree of T ∆ induced by the vertices at distance ≤ h from ρ and let L h denote the leaves of T h . We will consider the random-cluster distribution on T h with the so-called wired boundary condition where all the leaves are identified into a single vertex or, equivalently, all the leaves are connected to a vertex "at infinity". 5 In particular, for S ⊆ E h , let k * (S) denote the number of connected components in the graph with vertex set V h ∪ {∞} and edge set S ∪ (L h × {∞}); the purpose of the extra vertex ∞ and the edges L h × {∞} connecting the leaves to ∞ is to capture the wired boundary condition that all leaves are in the same cluster. The "wired" RC distribution on T h is given by Denote by ρ ↔ ∞ the event that there exists an open path connecting the root ρ to infinity (or, equivalently, that the root is connected via an open path to some leaf). The following lemma is implicitly proved in [16] in the context of the infinite tree, we give an alternative proof following the approach in [15,Chapter 10] which is carried out for the case ∆ = 3. (The proof is for the sake of completeness, and the reader might want to skip this.) Lemma 33 ([16, Theorems 1.5 & 1.6]). Let ∆ ≥ 3, q ≥ 1 and p ∈ [0, 1]. Then, in the wired RC distribution on T h , the probability of the event ρ ↔ ∞ converges as h grows, i.e., For p c (q, ∆) as in (3), it holds that ϕ * = 0 if p < p c (q, ∆) and ϕ * > 0 if p > p c (q, ∆).
Proof of Lemma 33. For convenience, let d := ∆ − 1. Let Z * h,∞ denote the contribution to Z * h from S ⊆ E h such that ρ is connected to infinity and Z * h,¬∞ from S ⊆ E h such that ρ is not connected to infinity. Note that ϕ for all h ≥ 0. Moreover, with t := p q + 1 − p, we have that It follows that for all h ≥ 0 and therefore, as h goes to infinity, ϕ * h (ρ ↔ ∞) converges to a limit ϕ * which is the largest root in the interval [0, 1] of the equation , whose inverse transformation is given by . It follows that u is the largest root in the interval [1, 1 1−p ] of the equation where h(y) = (y−1)(y d +q−1) is the same function as in (3). We next examine for which values of p it holds that the root u of (60) is strictly larger than 1 (note that ϕ * > 0 iff u > 1). Recall from (3) that the value of p c (q, ∆) is given by First, consider the case p < p c (q, ∆). For the sake of contradiction, assume that u > 1. Then, we obtain from (60) that contradiction. Thus, ϕ * = 0 for all p < p c (q, ∆). Next, consider the case p > p c (q, ∆). Using the continuity of the function h in the interval (1, ∞), we obtain that there exists y > 1 such that p = 1 − 1 1+h(y) . Note that 1 + h(y) = 1 1 − p and we have that y ≤ 1 + h(y) for all y > 1, so in fact y ∈ (1, 1 1−p ]. It follows that u, the largest root in the interval [1, 1 1−p ] of the equation (60), satisfies u ≥ y > 1, and therefore ϕ * > 0.
We will need the following corollary of Lemma 33 for a slightly modified tree where the root has degree ∆ − 2. In particular, for an integer h ≥ 1, letT h = V h ,Ê h be the tree obtained by taking ∆ − 2 disjoint copies of T h−1 and joining their root vertices into a new vertex ρ (for h = 0, we letT h to be the single-vertex graph). Analogously to (59), we useL h to denote the leaves of the tree andφ * h to denote the wired measure onT h where all the leaves are wired to infinity, i.e., and k * (S) denotes the number of connected components in the graph V h ∪ {∞}, S ∪ (L h × ∞) .

Analysing RC on disjoint trees whose roots are connected via a path
We are now in position to prove Lemma 18 which we restate here for convenience. Lemma 18. Let ∆ ≥ 3 be an integer, q ≥ 1 and p < p c (q, ∆). There exist constants K < 1/(∆ − 1) and ǫ > 0 such that the following holds for all sufficiently large integers ℓ and h.
Let G be a ∆-regular graph and P be a path with ℓ vertices whose h-graph-neighbourhood contains (1 − ǫ)ℓ isolated tree components. Let ϕ G be the RC distribution on G with parameters p, q. Then, Let us first give the rough idea of the proof. For simplicity, we will consider the somewhat special case where ǫ = 0, but the argument can be easily adapted to account for small positive ǫ > 0. In particular, let H be the graph induced by the h-graph-neighbourhood of the path P , together with the edges of P . For ǫ = 0, the assumptions of the lemma imply that H is a union of ℓ disjoint trees, each 7 isomorphic toT h , whose roots are connected by a path. Using the monotonicity of the RC distribution (cf. Lemma 19), to upper bound the probability that P is open in ϕ G , we can condition all the edges outside H to be open; let ϕ * H be the conditioned probability distribution (note the analogy with the wired measure we considered in the previous section).
Consider first the graph F which is the disjoint union of the ℓ trees (i.e., F is obtained from H by removing the edges of the path P ) and let ϕ * F be the analogue of ϕ * H (i.e., the RC distribution on G\P where all edges outside F are assumed to be open). Crucially, since p < p c (q, ∆), we have by Lemma 33 that only ǫ ′ ℓ roots are connected via an open path to infinity, where ǫ ′ > 0 is a constant that can be made arbitrarily small by taking the depth h of the trees sufficiently large. Denote by R this (random) set of root vertices, so that |R| ≤ ǫ ′ ℓ with high probability. Now, we add the edges of the path P and consider how this reweights the random-cluster configuration. In particular, we focus on edges of the path which are not incident to a root in R, let e be such an edge. If e is open, we get a factor of p/q in the weight of the random-cluster configuration (p because e is open and 1/q because the total number of connected components decreases by 1); if e is closed, we get a factor of 1 − p in the weight of the random-cluster configuration (because e is closed -note that the number of connected components stays the same in this case). Since |R| ≤ ǫ ′ ℓ, there are at least ℓ − 2ǫ ′ ℓ edges of the path P that are not incident to a vertex in R. Therefore the probability that all of them are open is roughly τ ℓ(1−2ǫ) where τ := p/q p/q+(1−p) . For p < p c (q, ∆) it holds that τ < 1/(∆ − 1) and hence the lemma follows by taking ǫ ′ > 0 to be a sufficiently small constant (to account also for the roughly 2 ǫ ′ ℓ choices of the set R).
Proof of Lemma 18. For convenience, let d := ∆ − 1. We begin first by specifying the constant K and how large ℓ and h need to be.
Let τ := p p+q(1−p) and note that for all p < p c (q, ∆) we have that τ < 1/d (since p c (q, ∆) < q q+d−1 ). Let K be any constant satisfying τ < K < 1/d and ǫ > 0 be a small constant such that for ǫ ′ := 10q 3 ǫ it holds that τ 1−ǫ ′ q 10ǫ ′ < K (note that such an ǫ exists by considering the limit ǫ ↓ 0). Let ℓ be sufficiently large so that ǫ ′ ℓ ≥ ǫℓ ≥ 2. As in Section 7.1, for an integer h ≥ 0, letT h = (V h ,Ê h ) denote the subtree of the (∆ − 1)-ary tree with height h where the root ρ has degree ∆ − 2 (and every other non-leaf vertex has degree ∆). LetẐ * h denote the partition function of the random-cluster model where all the leaves are connected to infinity (cf. (61)). LetẐ * h,∞ be the contribution toẐ * h from S ⊆Ê h such that ρ is connected to infinity andẐ * h,¬∞ from S ⊆ E h such that ρ is not connected to infinity. Since p < p c (q, ∆), we have by Corollary 35 that for all sufficiently large h it holds that We are now ready to proceed to the proof of the lemma. Consider a path P with vertices u 1 , . . . , u ℓ whose h-graph neighbourhood contains at least (1 − ǫ)ℓ isolated tree components. By definition, an isolated tree component contains exactly one of the vertices u 1 , . . . , u ℓ and therefore the set U = u i u i belongs to an isolated tree component and u i = u 1 , u ℓ , Indeed, for u ∈ U , denote by C u the component of the h-graph-neighbourhood of the path P that the vertex u belongs to. Since u ∈ U , C u is isolated and therefore the vertex set of C u is precisely Γ h (G\P, u) and C u is an induced subgraph of G. It remains to observe that C u is a tree (since u ∈ U ), every non-leaf vertex in C u different than u has degree ∆ (since the graph G is ∆-regular) and u itself has degree ∆ − 2 (since u = u 1 , u ℓ ). Let F be the subgraph of G induced by the vertex set u∈U Γ h (G\P, u). Note that F is a disjoint union of copies ofT h . For convenience, denote by V F , E F the vertex and edge set of F and similarly denote V P , E P for the corresponding sets of the path P . Note that E F is disjoint from In the following, we focus on lower bounding the inner sum in (66). Recall that the graph F consists of |U | disjoint copies of the treeT h , each rooted at a vertex in U . For an integer r = 0, . . . , |U |, let Ω F (r) = S ′ ⊆ E F exactly r vertices in U are connected to ∞ in the graph with vertex set V F ∪ {∞} and edge set S ′ ∪ (L × ∞) .
Then, we have that (Note that the r.h.s. in (67) counts |U | − r + 1 times the component containing ∞, while k * (S ′ ) only once for all S ′ ⊆ E F , which explains the need for the factor q |U |−r on the l.h.s.). We therefore have that (using q ≥ 1) Plugging (68) into (66) and using the binomial expansion yields the first inequality in (64) as wanted.
To prove the second inequality in (64), consider as before the set of configurations Ω F (r) where exactly r roots of the trees are connected to infinity and let Ω H (r) = {S ′ ∪ E P | S ′ ∈ Ω F (r)}. For S ∈ Ω H (r), we will show The equalities are an immediate consequence of the equalities in (65) and the fact that, by the definition of Ω H (r), we have that E P ⊆ S for all S ∈ Ω H (r). To justify the inequality, note that there are exactly M = r + ℓ − |U | vertices of the path P which are connected to infinity. It follows that there are at least ℓ − 1 − 2M = 2(|U | − r) − (ℓ + 1) edges in S ∩ E P = E P whose endpoints are not connected to infinity; deleting any of these edges causes the number of components to increase by one. Using (69) and the fact that q ≥ 1, we can bound Q by where Using (67) again, we obtain that Recall that ǫ ′ = 10q 3 ǫ; using (62), note that for all r ≥ ǫ ′ ℓ we have We also have that |U | ≥ (1 − 2ǫ)ℓ, so ℓ − |U | ≤ 2ǫℓ. Therefore, for all sufficiently large ℓ we have the bound Plugging this into (70) yields the second inequality in (64), as needed.
8 Analysing the Potts model on graphs with tree-like structure The goal of this section is to prove Lemma 27.

Analysing Potts on trees
Fix an integer ∆ ≥ 3. As in Section 7.1, we use T ∆ denote the infinite (∆ − 1)-ary tree with root vertex ρ. For integer h ≥ 0, let T h = (V h , E h ) denote the subtree of T ∆ induced by the vertices at distance ≤ h from ρ and let L h denote the leaves of T h . Recall that uniqueness on T ∆ implies that root-to-leaves correlations on T h tend to 0 as h → ∞, cf. Definition 3. The following lemma extends these decay properties to arbitrary subtrees of T h ; this was proved in [8] in the case of the colourings model and the proof for the Potts model is analogous.
Lemma 36. Let ∆, q ≥ 3 be integers, and B > 0 be in the uniqueness regime of the (∆ − 1)-ary tree. There exists a function ϑ : N → R ≥0 with ϑ(h) → 0 as h → ∞ such that the following holds for any integer h ≥ 0.
Let T ′ h be an arbitrary subtree of T h containing the root ρ and let L ′ h be the set of vertices in T ′ h at distance exactly h from ρ. Then, for any colour c ∈ [q] and any configuration τ : Proof. We will show that the statement of the lemma holds with the function ϑ(·) given by ϑ(0) = 1 and ϑ(h) := max Note that, since B is assumed to be in the uniqueness regime of the (∆ − 1)-ary tree, we have by definition that ϑ(h) → 0 as h → ∞. We first show by induction on h that, for all h ≥ 0, for all subtrees T ′ h of T h containing the root ρ, for all configurations τ : L ′ h → [q] and any colour c ∈ [q], it holds that For h = 0 the result is trivial. Suppose that h ≥ 1 and that the result holds for all integers less than h, we prove the result for h as well. Let U = {u 1 , . . . , u ∆−1 } be the children of ρ in T h and let U ′ ⊆ U be the children of ρ in T ′ h . For a vertex u ∈ U , denote by T h (u) the subtree of T h rooted at u and by L h (u) the vertices in L h that belong to the tree T h (u). Similarly, for a vertex u ∈ U ′ , denote by T ′ h (u) the subtree of T ′ h rooted at u and by L ′ h (u) the vertices in L ′ h that belong to the tree T ′ h (u). Using standard tree recursions (see for example [11,Lemma 19]), we have that and For every u ∈ U \U ′ we have that L ′ h (u ′ ) = ∅ and therefore, using the symmetry among the colours, we have that for every c ′ ∈ [q] it holds that For u ∈ U ′ , we have that T h (u) is isomorphic to T h−1 and T ′ h (u) is a subtree of T h (u), so by the induction hypothesis we have that for any colour c ∈ [q] it holds that Combining (72) To complete the proof it remains to observe that for any colour c ∈ [q], it holds that Note that, by the definition of the function ϑ(·), for any η : Combining this with (71) yields the lemma.

Analysing Potts on disjoint trees whose roots are connected via a path
To prove Lemma 27, we will need the following lemmas.
Lemma 37. Let q ≥ 3 and B ∈ (0, 1), and set χ := 1+B B+q−1 . Let P be a path with ℓ ≥ 2 vertices and endpoints u, v. Then, for arbitrary colours c, c ′ ∈ [q], it holds that Proof. Let z be the neighbour of v (note that if ℓ = 2 then z = u). We have that Note also that for arbitrary colours c, c ′ , c ′′ ∈ [q] we have that Combining this with (76), we obtain that µ P (σ v = c ′ | σ u = c) ≥ B/(B + q − 1) and hence .
Let c 1 , c 2 ∈ [q] be distinct colours in [q] and E(c 1 , c 2 ) be the event that every vertex in P is coloured with c 1 , c 2 . The lemma will follow by showing that Let us briefly conclude the lemma assuming (78). Indeed, if c = c ′ , then applying (78) for c 1 = c and c 2 = c ′ and using the lower bound in (77) we obtain that while, if c = c ′ , we obtain by summing (78) for c 1 = c and the q − 1 possible values of c 2 that It remains to prove (78). For convenience, denote by w 1 , . . . , w ℓ the vertices of P in order so that u = w 1 and v = w ℓ and let C = {c 1 , c 2 }. Note that We have µ P (σ w 1 ∈ C) = 2/q. For i = 2, . . . , n, let P i be the path induced by the vertices w i−1 , . . . , w n . We have that Since µ P i (σ w i ∈ C) = µ P i (σ w i−1 ∈ C) = 2/q, we have by the Bayes' rule that Combining these, we obtain (78), therefore concluding the proof of Lemma 37.
We will use the following corollary of Lemma 37.
Corollary 38. Let q ≥ 3 and B ∈ (0, 1), and set χ := 1+B B+q−1 . Let P be a path with ℓ ≥ 2 vertices and let Λ be a subset of the vertices which includes the endpoints of the path. Then, for any configuration τ : Λ → [q], it holds that Proof. Let t := |Λ| and denote the set of vertices in Λ by u 1 , . . . , u t in the order that they appear in the path. Note that u 1 and u t are the endpoints of P (since by assumption Λ includes the endpoints of P ). For i = 1, 2, . . . , t − 1, let P i be the path induced by the vertices between u i and u i+1 and let ℓ i be the number of vertices in P i . Then, we have By Lemma 37, we have that and therefore, since t i ℓ i = ℓ and χ, B ∈ (0, 1), we obtain that This finishes the proof.
We are now ready to prove Lemma 27, which we restate here for convenience.
Let G be a graph of maximum degree ∆ and P be a path with ℓ vertices whose h-graphneighbourhood contains (1 − ǫ)ℓ isolated tree components. Let µ G be the Potts measure on G with parameter B. Then, µ G (path P is bichromatic) ≤ K ℓ .
Proof. Let χ := 1+B B+q−1 and consider arbitrary ǫ ′ > 0. We begin by specifying the constants K, ǫ and how large ℓ and h need to be. In particular, let K be any constant satisfying χ < K < χ + ǫ ′ and ǫ > 0 be a small constant such that Note that such an ǫ exists by considering the limit ǫ ↓ 0. Let ℓ be sufficiently large so that Finally, let ϑ(·) be the function in Lemma 36, so that for all sufficiently large h it holds that We are now ready to proceed to the proof of the lemma. Let G be a graph and consider a path P in G, with vertices u 1 , . . . , u ℓ , whose h-graph neighbourhood contains at least (1 − ǫ)ℓ isolated tree components. By definition, an isolated tree component contains exactly one of the vertices u 1 , . . . , u ℓ and therefore the set U = u i u i belongs to an isolated tree component and u i = u 1 , u ℓ , Note that we exclude the endpoints of the path P from U , even if they belong to isolated tree components. Let also U denote the set {u 1 , . . . , u ℓ }\U , so that |U | ≤ 2ǫℓ. Recall that T h is the subtree of the (∆ − 1)-ary tree consisting of vertices at distance at most h from the root. By definition of the set U , we have that for u ∈ U , Γ h (G\P, u) induces a subgraph in G which is a subtree of T h .
For u ∈ U , let for convenience V u = Γ h (G\P, u), F u be the subgraph of G induced on V u and L u be the set of vertices that are at distance h from u in F u . Note that F u is a tree rooted at u all of whose vertices are at distance at most h from u; L u is thus the set of all leaves in F u which are at distance h from u (note that there could be other leaves which are closer to u but these will not matter). Let F be the union of the subgraphs F u for u ∈ U ; note that F is a disjoint union of copies of trees (each of which is a subtree of T h from (82)). Let also L := u∈U L u . We will also denote by V P , E P the vertex and edge set of the path P . Note that E F is disjoint from E P (though V P and V F intersect at V P ). Finally, let H = (V H , E H ) be the subgraph of G with vertex set V P ∪ V F and edge set E P ∪ E F .
To bound the probability that P is bichromatic, we will condition on a worst case boundary configuration on L and U . In particular, we have the bound To bound the r.h.s., fix arbitrary configurations τ : L → [q], τ ′ : U → [q] and note that so the lemma will follow (since τ, τ ′ are arbitrary) by showing that, for arbitrary colours c 1 , c 2 ∈ [q], it holds that where E(c 1 , c 2 ) is the event that each vertex in P is coloured with either c 1 or c 2 . Since U includes the endpoints of the path P , by Corollary 38 we have that Let Z P be the partition function of the path P and Z P (c 1 , c 2 ) be the contribution to Z P from configurations such that P is coloured with c 1 or c 2 , i.e., Then, (84) translates into Z P (c 1 , c 2 )/Z P ≤ (4q/χB) |U | χ |U | .
The trees hanging from vertices in U reweight the probability that the path is bichromatic but do not cause significant distortion since the states of the roots are roughly uniformly distributed (because of uniqueness on the tree). To quantify this, for u ∈ U and a colour c ∈ [q], let Let also Z u = c∈[q] Z u (c). By Lemma 36 and the choice of h in (81), we have that for all c ∈ [q] it holds that We can now write µ H E(c 1 , c 2 ) | σ L = τ, σ U = τ ′ as .
Dividing both numerator and denominator by u∈U Z u , we obtain using (85) that We have |U | ≥ (1 − 2ǫ)ℓ and |U | ≤ 2ǫℓ, so from the choice of ǫ and ℓ in (79) and (80), we obtain that which is exactly (83), as wanted. This concludes the proof of Lemma 27.

Correlation decay and sampling for antiferromagnetic Ising
In this section, we prove Theorem 24 and Lemma 25. The proofs follow relatively easily from correlation decay bounds on trees appearing in [27]; the bounds in there are stated for the case of the ferromagnetic Ising model, but there is a simple translation of these bounds to the antiferromagnetic case which allows us to conclude the desired results. In Section 9.1, we import the results from the literature that we need. We then give the proof of Theorem 24 in Section 9.2 and the proof of Lemma 25 in Section 9.3.

Preliminaries
Following [27], for a graph G and a vertex u in G, we consider the self-avoiding walk tree T = T SAW (G, u), which consists of all paths starting from u and not intersecting themselves, except possibly at the terminal vertex of the path. 8 The following lemma originates in the work of Weitz for the hard-core model [33]; it is well-known that the lemma holds more generally for any 2-state system. The particular version we state here is close to [27,Lemma 13].
Lemma 39 (see, e.g., [27,Lemma 13]). Let G = (V, E) be a graph and u be a vertex in G. Consider the self-avoiding walk tree T = T SAW (G, u) starting from u and denote by A the leaves of the tree. Then, there is a configuration η : A → {1, 2} (described in [33]) such that the following holds for any set Λ ⊆ V and any configuration τ : Λ → {1, 2}. Let U Λ be the set of vertices in T which correspond to vertices in Λ and let η ′ : U Λ \ A → {1, 2} be the configuration where each vertex in U Λ \A inherits the state of the corresponding vertex in Λ under τ . Then, The following lemma follows from a strong spatial mixing result in [27] for the ferromagnetic Ising model, which in turn builds upon a lemma from [3].
Lemma 40. Let B ∈ (0, 1). Let T = (V, E) be a tree, Λ ⊆ V be a subset of the vertices and u be an arbitrary vertex. Let τ 1 , τ 2 : Λ → {1, 2} be two configurations on Λ which differ only on a subset U ⊆ Λ. Then, where dist(u, v) denotes the distance between u and v in T .
Proof. It is well-known that, on bipartite graphs G = (V, E), there is a measure-preserving bijection between configurations of the antiferromagnetic Ising model with parameter B ∈ (0, 1) and configurations of the ferromagnetic Ising model with parameter 1/B, obtained by flipping the states of each vertex on one side of the bipartition. The desired inequality therefore follows from the strong spatial mixing result for the ferromagnetic Ising model given in [27,Lemma 14]. To translate the parameterisation of that result, note that in [27] the weight of a (ferromagnetic) Ising configuration σ is parameterised to be proportional to exp(2βm(σ)) -therefore 1/B = e 2β , so tanh β = 1−B 1+B . Using this translation, we obtain the desired inequality.

Proof of Theorem 24
To prove Theorem 24, we will use the following algorithm that allows us to compute conditional marginal probabilities with very small absolute error.
Lemma 41. Let B ∈ (0, 1) and b > 0 be constants such that b 1−B 1+B < 1, and let ∆ ≥ 3 be an integer. Then, there exists M 0 > 0 such that the following holds for all M > M 0 .
There is a polynomial-time algorithm that, on input: (i) an n-vertex graph G = (V, E) with maximum degree at most ∆ and average growth b up to depth L = ⌈M log n⌉, (ii) a subset Λ ⊆ V with a configuration τ : Λ → {1, 2}, and (iii) a vertex u ∈ V \ Λ, outputs a numberp ∈ [0, 1] such that where p := π G (σ u = 1 | σ Λ = τ ), i.e.,p is within absolute error O(1/n 11 ) from the marginal probability that σ u = 1 conditioned on σ Λ = τ , where σ is from the Ising distribution π G on G with parameter B.
Proof. Let M 0 be a large constant so that for n ≥ 2 and L 0 := ⌈M 0 log n⌉, it holds that .
Note that such an M 0 exists since b 1−B 1+B < 1. Fix M to be an arbitrary constant larger than M 0 . Let G be an arbitrary n-vertex graph with average growth b up to depth L = ⌈M log n⌉, Λ ⊆ V be a subset of the vertices, τ : Λ → {1, 2} be a configuration on Λ, and u ∈ V \ Λ be a vertex. Consider the self-avoiding walk tree T = T SAW (G, u) starting from u, and denote by A the leaves of the tree. By Lemma 39, we have that there is a configuration η : A → {1, 2} such that where U := U Λ is the set of vertices in T that correspond to some vertex in Λ and η ′ is the assignment on U \ A inherited by τ (see Lemma 39 for details). For convenience, let F := U ∪ A and ζ : F → {1, 2} be the configuration which agrees with η on A and with η ′ on U \A, so that (87) can be rewritten as π G (σ u = 1 | σ Λ = τ ) = π T (σ u = 1 | σ F = ζ).
Note thatp can be computed in polynomial time since the tree T ′ and the configuration ζ F ′ can be constructed in polynomial time (since T ′ is a tree of size O(∆ L )). So, the lemma will follow by showing that .
To prove this, let J be the set of vertices in V ′ \F ′ whose distance from u is exactly L − 1 in T and note that |J| ≤ b L since G has average growth b up to depth L (cf. Definition 11). Note also that, conditioned on the configurations on J and F ′ , the probability that σ u = 1 depends only on T ′ (and not on the configuration on rest of the tree T ). Using the law of total probability, we can therefore expand p as Therefore, (88) will follow by showing that, for any configuration ι : J → {1, 2} it holds that .
Using Lemma 41, the proof of Theorem 24 follows by standard techniques. We restate it here for convenience.
Theorem 24. Let B ∈ (0, 1) and b > 0 be constants such that b 1−B 1+B < 1, and let ∆ ≥ 3 be an integer. Then, there exists M 0 > 0 such that the following holds for all M > M 0 .
There is a polynomial-time algorithm that, on input an n-vertex graph G with maximum degree at most ∆ and average growth b up to depth L = ⌈M log n⌉, outputs a configuration τ : V → {1, 2} whose distribution ν τ is within total variation distance 1/n 10 from the Ising distribution on G with parameter B, i.e., ν τ − π G TV ≤ 1/n 10 .
Moreover, the algorithm, when given as additional input two vertices u and v in G, outputs a configuration τ : V → {1, 2} such that τ u = 1 and τ v = 2, and whose distribution ν τ satisfies ν τ − π 1,2 G,u,v (·) TV ≤ 1/n 10 , where π 1,2 G,u,v is the Ising distribution on G conditioned on u having state 1 and v having state 2.
Proof of Theorem 24. Denote by v 1 , v 2 , . . . , v n the vertices of G. The algorithm will sample the state s i of vertex v i sequentially for i = 1, . . . , n. We just give the details for the first part of the algorithm, the proof of the second part is completely analogous (namely, it suffices to assume in the following that we first set v 1 = u, v 2 = v, we then fix the states s 1 = 1, s 2 = 2 and finally sample the states s i for i = 3, . . . , n.) Assume that, at some time i = 1, . . . , n, we have sampled the states s 1 , . . . , s i−1 (which can take arbitrary values in {1, 2}). Using the algorithm of Lemma 41, we obtain in polynomial time numbersp i (1),p i (2) ∈ [0, 1] such thatp i (1) +p i (2) = 1 and, for s ∈ {1, 2}, it holds that , where a i (s) := π G (σ v i = s | σ v 1 = s 1 , . . . , σ v i−1 = s i−1 ). (91) We then sample the state s i by letting s i = 1 with probabilityp i (1), or else s i = 2 (note that s i = 2 with probabilityp i (2)). Denote by τ the final configuration and by ν τ its distribution. We will show that, for any configuration η : V → {1, 2}, it holds that so by summing over η we obtain π G (η) = 1/n 10 , which proves the first part of the theorem; the second part follows analogously. To prove (92), fix an arbitrary configuration η : V → {1, 2} and let Note that π G (η) = n i=1 p i,η and ν τ (η) = n i=1p i (η v i ).
Moreover, from (91), we have that for all i = 1, . . . , n it holds that where the last inequality follows from the lower bound of Lemma 26. Using the inequalities 1 − x ≤ e −x ≤ 1 − x/2 which hold for all x ∈ [0, 1/2], we obtain that, for ǫ = 1/n 11 , it holds that e −ǫ p i,η ≤p i (η v i ) ≤ e ǫ p i,η .

Proof of Lemma 25
In this section, we give the proof of Lemma 25, which we restate here for convenience.
Lemma 25. Let B ∈ (0, 1) and b > 0 be constants such that b 1−B 1+B < 1. Then, there exists M ′ 0 > 0 such that the following holds for all M > M ′ 0 . Let G be an n-vertex graph with average growth b up to depth L = ⌈M log n⌉, and let u, v be distinct vertices in G. Then where P ℓ (G, u, v) is the number of paths with ℓ vertices in G that connect u and v.
Note that such a constant exists since b 1−B 1+B < 1. Fix arbitrary M > M ′ 0 . Let G be an arbitrary graph with average growth b up to depth L = ⌈M log n⌉, Λ ⊆ V be a subset of the vertices, τ : Λ → {1, 2} be a configuration on Λ, and u ∈ V \ Λ be a vertex.
Consider the self-avoiding walk tree T = T SAW (G, u) starting from u, and denote by A the leaves of the tree and by U the set of vertices in T that correspond to v. For a subset of vertices W of the tree we denote by σ W = 1 the event that all vertices in W have the state 1 and analogously for σ W = 2. By Lemma 39, we have that there is a configuration η : A → {1, 2} such that for s ∈ {1, 2} it holds that π G (σ u = 1 | σ v = s) = p s where p s := π T σ u = 1 | σ A = η, σ U \A = s .
Assuming these for the moment, we obtain by (95) and the triangle inequality that thus proving the lemma. It thus remains to prove (96) and (97).
To prove (96), note that by Lemma 40, we have that where the last inequality follows from observing that each vertex w ∈ U ′ \A ′ with dist(u, w) = ℓ corresponds to a distinct path with ℓ + 1 vertices between u and v in G. This proves (96).
To prove (97), we focus on showing that |p 1 − p ′ 1 | ≤ 1/(2n 10 ), the other inequality being completely analogous. We follow closely a similar argument which was presented in the proof of Lemma 41. Let J be the set of vertices in V ′ \(U ′ ∪ A ′ ) whose distance from u is exactly L − 1 in T and note that |J| ≤ b L since G has average growth b up to depth L (cf. Definition 11). Note also that, conditioned on the configurations on J and U ′ ∪ A ′ , the probability that σ u = 1 depends only on T ′ (and not on the configuration on rest of the tree T ), i.e., we can expand p as Therefore, |p 1 − p ′ 1 | ≤ 1/(2n 10 ) will follow by showing that, for any configuration ι : J → {1, 2} it holds that π T ′ σ u = 1 | σ A ′ = η A ′ , σ U ′ \A ′ = 1, σ J = ι − p ′ 1 ≤ 1/(2n 10 ).
We can expand p ′ 1 analogously to (98) by conditioning on the configuration on J, so to prove (98) it suffices to show that for any two configurations ι 1 , ι 2 : J → [q] it holds that κ ≤ 1/(2n 10 ) where By the strong spatial mixing result of Lemma 40, we have that Combining this with the choice of M ′ 0 (cf. (94)), we obtain κ ≤ 1/(2n 10 ), thus concluding the proof of |p 1 − p ′ 1 | ≤ 1/(2n 10 ) and therefore completing the proof of Lemma 25.