Abstract 1 Introduction 2 Technical Preliminaries 3 Our Results References

Brief Announcement: Congested Clique Counting for Local Gibbs Distributions

Joshua Z. Sobel ORCID Department of Computer Science, University of Iowa, Iowa City, IA, USA
Abstract

There are well established reductions between combinatorial sampling and counting problems (Jerrum, Valiant, Vazirani TCS 1986). Building off of a very recent parallel algorithm utilizing this connection (Liu, Yin, Zhang arxiv 2024), we demonstrate the first approximate counting algorithm in the CongestedClique for a wide range of problems. Most interestingly, we present an algorithm for approximating the number of q-colorings of a graph within ϵ-multiplicative error, when q>αΔ for any constant α>2, in O~(n1/3ϵ2) rounds. More generally, we achieve a runtime of O~(n1/3ϵ2) rounds for approximating the partition function of Gibbs distributions defined over graphs when simple locality and fast mixing conditions hold. Gibbs distributions are widely used in fields such as machine learning and statistical physics. We obtain our result by providing an algorithm to draw n random samples from a distributed Markov chain in parallel, using similar ideas to triangle counting (Dolev, Lenzen, Peled DISC 2012) and semiring matrix multiplication (Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela PODC 2015). Aside from counting problems, this result may be interesting for other applications requiring a large number of samples.

Keywords and phrases:
Distributed Sampling, Approximate Counting, Markov Chains, Gibbs Distributions
Copyright and License:
[Uncaptioned image] © Joshua Z. Sobel; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
; Theory of computation Random walks and Markov chains
Related Version:
Full Version: https://arxiv.org/abs/2508.13083
Acknowledgements:
Thanks to Sriram Pemmaraju for suggestions and discussions.
Funding:
The author was partially supported during this project by National Science Foundation grants 1955939 and 2402835.
Editor:
Dariusz R. Kowalski

1 Introduction

In a pioneering work, Valiant proposed the computational complexity class #P [12]. Problems in #P are the counting variants of decision problems in NP; for example, counting the number of proper vertex colorings of a graph using at most q colors (q-colorings). Unfortunately, Valiant established that even when a decision problem is in P, such as determining whether a bipartite graph has a perfect matching, the corresponding counting problem can be NP-hard. Thus for most counting problems, we need to settle for ϵ multiplicative approximation rather than exact counting. The goal is to estimate a true count X by X^, such that (1ϵ)XX^(1+ϵ)X with probability at least 34. By the standard median trick, 3/4 can be boosted to any constant less than one.

The typical method for approximate counting involves drawing random samples that are solutions to the corresponding search problem. In other words, we can count the number of colorings of a graph by sampling random colorings. This approach dates back to Jerrum, Valiant, and Vazirani. The initial reduction from counting to sampling [8] requires drawing a very large number of samples; this has been improved over several subsequent works [7, 1, 13, 10]. In particular, when considering Gibbs distributions defined over graphs with n vertices and meeting mild conditions, Štefankovič, Vempala, and Vigoda [13] showed that counting can be accomplished by taking O~(nϵ2) samples. Gibbs distributions originated in physics, and generalize many problems such as the uniform distribution of q-colorings.

Unfortunately, the algorithm from [13] cannot be fully parallelized, as samples drawn later depend on samples drawn earlier. Very recently, Liu, Yin, and Zhang [10] also showed that for many Gibbs distributions defined over graphs counting can be done using O~(nϵ2) total samples, but in their approach every sample can be taken in parallel. This implies that counting in the CongestedClique reduces to quickly drawing O~(nϵ2) samples. From existing work, [5, 6, 4], we know that for many Gibbs distributions drawing a single sample within total variation distance error of δ can be done in O(lognδ) CongestedClique rounds, using the distributed Metropolis-Hastings Markov chain. For Gibbs distributions that we call local, we identify a novel structural similarity between simulating one transition of n instances of the Markov chain and semiring matrix multiplication. In particular, utilizing the approach of Dolev, Lenzen, and Peled [3] for triangle counting and Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, and Suomela [2] for semiring matrix multiplication, we can simulate one transition of n instances of the Markov chain in O(n1/3) rounds and thus often obtain an approximate count in O~(n1/3ϵ2) CongestedClique rounds. The full version of our paper contains all of our results and proofs.

1.1 Main Results

Our central contribution is an algorithm to draw many samples at once from the distributed Metropolis-Hastings Markov chain. Beyond counting, this result may be useful in other algorithms where several samples are needed.

Theorem 1.

One transition can be simulated for n instances of the distributed Metropolis-Hastings chain in O(n1/3) rounds.

As a corollary to the previous theorem and the results of [10], under mild locality and mixing conditions which we will describe later, there is an O~(n1/3ϵ2) round counting algorithm. As the most interesting example, we obtain a fast algorithm for approximating the number of q-colorings of an input graph with degree Δ. In addition, our full paper demonstrates a much faster algorithm for the hardcore model with low fugacity.

Corollary 2.

There is an algorithm for approximating the number of q-colorings of a graph in O~(n1/3ϵ2) rounds, when q>αΔ, for α>2, within multiplicative error ϵ, with probability at least 34.

We also give a conditional lower bound for approximating a partition function, Theorem 10. Even under our locality and mixing assumptions, counting when ϵ132n2 is as hard as triange detection.

2 Technical Preliminaries

2.1 Gibbs Distributions on Graphical Models

Consider a traditional graph labeling problem, where each vertex receives a label from some underlying alphabet 𝒜. We focus on randomly sampling from a subset, 𝒮, of such labelings, given a distribution. In particular, we want to sample from a Gibbs distribution. Borrowing terminology from physics, we have the Hamiltonian, a function H:𝒮{0,1,,h}, for some integer h0.

Definition 3 (Gibbs Distribution).

The Gibbs distribution at inverse temperature β0 assigns a labeling, σ𝒮, a probability proportional to exp(βH(σ)).

In the trivial case, if we choose H=0, we get a uniform distribution over 𝒮.

Definition 4 (Partition Function).

The normalizing constant of the distribution, Z(β)=σ𝒮exp(βH(σ)), is referred to as the partition function.

We will make the assumption that |𝒜| and h are bounded by a polynomial in n. This is not a significant restriction and is necessary both algorithmically and to ensure messages are of size O(logn).

For this paper we will focus heavily on the well studied Potts model.

Definition 5.

The Potts model is a Gibbs distribution where 𝒜=[q], is a set of colors. 𝒮 is [q]V. The Hamiltonian, H, assigns to each, possibly improper, coloring the number of monochromatic edges (edges where both vertices have the same color).

Note that when β=, the Potts model simply becomes the uniform distribution over proper colorings.111We treat the infinite case in the sense of the limit as β. Furthermore, Z() counts the number of proper colorings. On the other hand, when β=0, the model is uniform over all proper and improper colorings. Then Z(0)=qn. More precisely, when β>0, this model is called the antiferromagnetic Potts model, since neighboring vertices favor not having the same color.

2.2 Counting From Sampling

We would now like to estimate Z(β), for a given choice of β. We call this the generalized counting problem, as Z gives the weighted count over all labelings. As discussed in the introduction, it is often sufficient to take O~(nϵ2) samples. More precisely, the result from Liu, Yin, and Zhang [10] is summarized by the next theorem.

Theorem 6.

For a Gibbs distribution and inverse temperatures 0β1<β2, Z(β2)Z(β1) can be computed with a multiplicative error of ϵ with probability at least 34 given the Hamiltonian of O~(nϵ2) samples taken from the Gibbs distribution for inverse temperatures ranging from [β1,β2]. Furthermore, the temperature each sample is taken at can be precomputed and only depends on n, h, and |𝒜|.

The theorem implies that we can approximate Z(β), as long as Z(β) is known in advance for some β0 and that we can draw O~(nϵ2) samples for inverse temperatures in the range [min(β,β),max(β,β)].

2.3 Pairwise Markov Random Fields

The traditional way to sample from a Gibbs distribution is using a rapidly mixing Markov chain. We begin from an arbitrary state (a graph labeling) and simulate transitions until the chain is mixed, meaning that the current state is roughly drawn from the desired distribution. Here we will focus on sampling from the probability space defined by a pairwise Markov random field. A pairwise Markov random field on a graph consists of vertex constraints and edge constraints.

For each vertex v, we have a constraint bv:𝒜0. For each edge e=(u,v) we also have a constraint Ae:𝒜×𝒜0. Here the edges are undirected, but the function Ae is not necessarily symmetric. The probability of graph vertex labeling l is defined to be proportional to

vbv(l(v))e=(u,v)Ae(l(u),l(v))
Definition 7 (Local Gibbs Distribution).

We call a Gibbs distribution local if

  1. 1.

    The distribution can be represented as a pairwise Markov random field.

  2. 2.

    The Hamiltonian H can be represented as a sum over sub-functions defined on each edge and vertex. H(σ)=vHv(σ)+e=(u,v)He(u,v). Furthermore, each of these subfunctions also has a codomain of [h].

  3. 3.

    An element σ𝒮 can be constructed in the CongestedClique in O(logn) rounds.

The third condition is needed to initialize the Markov chain.

Example 8 (Potts Model).

We can see that the Potts model is a local Gibbs distribution. For convenience we make the change of variable, β=lnλ. We choose bv(x)=1 and Ae(x,y)={λ,if x=y1,else

2.4 Distributed Sampling

Feng, Sun, and Yin [5] found a preliminary distributed Markov chain to sample from pairwise Markov random fields. In the case of q-colorings, this was improved simultaneously by Fischer and Ghaffari [6] and Feng, Hayes, and Yin [4]. The improved chain can be generalized beyond colorings, see [11] by Pemmaraju and Sobel for the chain explicitly given in full generality. One transition of the chain, for some yet to be determined probability p, is defined in the following pseudocode. Every vertex v has its current label Xv. For ease of notation, in the edge acceptance step, any fraction that is not well defined (because certain vertices may not be active) equals 1. Note that on some pairwise Markov random fields this chain may have a very large mixing time, logn and possibly infinite.

Algorithm 1 Distributed Metropolis-Hastings.

3 Our Results

3.1 Fast Mixing for the Potts Model

For the Potts model, the distributed Metropolis-Hastings Markov chain has a mixing time of O(lognδ). As the proof technique is standard, we defer the proof to the appendix of the full version of our paper.

3.2 Fast Simulation of Distributed Metropolis-Hastings

We would now like to estimate the partition function of a local Gibbs distribution. As described earlier, we need to know the Hamiltonian of k=O~(nϵ2) samples from the Gibbs distribution. This can be found by running k instances of the distributed Metropolis-Hastings Markov chain. We can simulate one transition of n instances of the chain in parallel.

Theorem 1. [Restated, see original statement.]

One transition can be simulated for n instances of the distributed Metropolis-Hastings chain in O(n1/3) rounds.

3.2.1 Algorithm

Initialization

Each chain starts at an initial labeling. At time step t, each vertex will hold Xv,i,t, the label of vertex v in chain i. It will also sample its new proposed label, σv,i,t. In the case the vertex is inactive in a chain, σv,i,t=.

To decide if Xv,i,t+1σv,i,t, assuming σv,i,t, v needs to know if any of its incident edges are rejected in chain i. We will use the technique from Dolev, Lenzen, and Peled [3] for triangle counting and also used by Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela [2] for matrix multiplication over semirings.

For convenience, we will use 0 based indexing. Let A be the adjacency matrix of the graph. Consider an n×n×n cube, Q, where Qv,w,i is true iff Av,w=1 and {v,w} is not accepted in chain i. Furthermore, think of Q evenly divided into n, n2/3×n2/3×n2/3 subcubes. Let subcube Q[x,y,z] refer to the subcube over the indices [xn2/3,(x+1)n2/3)×[yn2/3,(y+1)n2/3)×[zn2/3,(z+1)n2/3). We will use a similar notation for matrices, A[x,y] refers to the submatrix over the indices [xn2/3,(x+1)n2/3)×[yn2/3,(y+1)n2/3).

Computing 𝑸

Each subcube is arbitrarily assigned to a machine that will be responsible for generating the entries of the subcube. Because edges are non-directional, we assign the subcubes Q[x,y,z] and Q[y,x,z] to the same machine. Each machine is assigned at most two subcubes.

Consider the machine 𝖬 holding the subcubes Q[x,y,z] and Q[y,x,z]. To fill in the subcubes, 𝖬 needs to hold A[x,y], σ[x,z], σ[y,z], X[x,z], and X[y,z]. Altogether, 𝖬 needs to receive Θ(n4/3) words. For every machine to do this step in parallel, each machine needs to send and receive Θ(n4/3) words. This communication is done in Θ(n1/3) rounds, using Lenzen’s result [9]. 𝖬 then uses the information it received in addition to its randomness to fill in the (possibly two) subcube(s) it is responsible for. More precisely, 𝖬 runs the edge acceptance step of the Markov chain for the edge-chain pairs it is responsible for and fills in the subcube(s).

Updating Labels

𝖬 will compress each subcube it holds into a submatrix by applying union along the y-axis. In particular, the subcube Q[x,y,z] is flattened into the submatrix Ry[x,z], where Ry[x,z]i,k=true if there exists j such that Q[x,y,z]i,j,k=true.

Finally, vertex u accepts its proposal in chain i if yRu,iy=false, as this indicates that no incident edge to u was rejected. To compute this for every chain, u receives Ru,iy for all y and i. Doing this simultaneously for all vertices, requires each machine to send and receive O(n4/3) words of information. This takes a final O(n1/3) rounds.

Corollary 9.

With success probability at least 34, we can approximate the partition function, Z(β), of a local Gibbs distribution with multiplicative error ϵ in O~(n1/3ϵ2) rounds, when Z(β) is known for some β and the associated distributed Metropolis-Hastings chain has an O(logn) mixing time for inverse temperatures in the interval [min(β,β),max(β,β)].

Proof.

Theorem 6 and Theorem 1 almost immediately give us the result. We draw O~(nϵ2) samples from the Gibbs distribution by simulating O(logn) transitions of the Markov chain. There is one slight problem, the samples are held in a distributed fashion and we need the Hamiltonian of each sample held centrally at some machine. We use a very similar strategy as before, with a slight change described in our full paper.

Corollary 2. [Restated, see original statement.]

There is an algorithm for approximating the number of q-colorings of a graph in O~(n1/3ϵ2) rounds, when q>αΔ, for α>2, within multiplicative error ϵ, with probability at least 34.

3.3 Lower Bound

We can show that even in the case where the distributed Metropolis-Hastings chain has mixing time O(lognδ), we still have conditional hardness results.

Consider the following Gibbs distribution. 𝒜=V[3n]. 𝒮 are the labelings such that every vertex’s label is either in [3n] or is a neighboring vertex. We let the Hamiltonian count the number of edges {u,v} such that the labels of u and v are the same vertex. It is not hard to show that this is a local Gibbs distribution and that the mixing time of the distributed Metropolis-Hastings chain is O(lognδ) for β0.

As always, when β=0, all labelings in 𝒮 are equally likely. Z(0)=v3n+deg(v). On the other hand, when β=, we remove labelings where two neighboring vertices are labeled the same mutually neighboring vertex. This can only happen if there is a triangle in the graph. Thus, Z(0)>Z() if and only if there is a triangle in the graph. In fact, if the graph has at least one triangle, a,b,c, then 1Z()Z(0)116n2, since that is a lower bound on the probability that vertices a and b both choose c in a uniform random labeling. Rearranging, Z()Z(0)(1116n2)Z(0)(1132n2)2.

Note that we can easily compute Z(0) exactly. We can choose ϵ=132n2 and take an approximate count of Z(), Z¯. Suppose the graph doesn’t have a triangle. Then, Z()=Z(0). Thus, Z¯(1132n2)Z(0). Otherwise, the graph has a triangle. Then, Z¯(1+132n2)Z()(1132n2)2(1+132n2)Z(0)<(1132n2)Z(0). Thus, we can use Z¯ to determine if the graph has a triangle.

Theorem 10.

Approximating the partition function of a local Gibbs distribution when ϵ132n2 with success probability p is as hard as triangle detection with success probability p. This is true even when samples can be taken for all β0 in O~(1) rounds.

Note that the best known algorithm for triangle detection takes O(n0.157) rounds [2, 14].

References

  • [1] Ivona Bezáková, Daniel Štefankovič, Vijay V. Vazirani, and Eric Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, pages 900–907, USA, 2006. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1109557.1109656.
  • [2] Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. Algebraic methods in the congested clique. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC ’15, pages 143–152, New York, NY, USA, 2015. Association for Computing Machinery. doi:10.1145/2767386.2767414.
  • [3] Danny Dolev, Christoph Lenzen, and Shir Peled. "tri, tri again": finding triangles and small subgraphs in a distributed setting. In Proceedings of the 26th International Conference on Distributed Computing, DISC’12, pages 195–209, Berlin, Heidelberg, 2012. Springer-Verlag. doi:10.1007/978-3-642-33651-5_14.
  • [4] Weiming Feng, Thomas P. Hayes, and Yitong Yin. Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors), 2018. arXiv:1802.06953.
  • [5] Weiming Feng, Yuxin Sun, and Yitong Yin. What can be sampled locally? In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC ’17, pages 121–130, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3087801.3087815.
  • [6] Manuela Fischer and Mohsen Ghaffari. A Simple Parallel and Distributed Sampling Technique: Local Glauber Dynamics. In Ulrich Schmid and Josef Widder, editors, 32nd International Symposium on Distributed Computing (DISC 2018), volume 121 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:11, Dagstuhl, Germany, 2018. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.DISC.2018.26.
  • [7] Mark Jerrum. A very simple algorithm for estimating the number of k-colorings of a low-degree graph. Random Struct. Algorithms, 7(2):157–165, September 1995. doi:10.1002/RSA.3240070205.
  • [8] Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:169–188, 1986. doi:10.1016/0304-3975(86)90174-X.
  • [9] Christoph Lenzen. Optimal deterministic routing and sorting on the congested clique. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing, PODC ’13, pages 42–50, New York, NY, USA, 2013. Association for Computing Machinery. doi:10.1145/2484239.2501983.
  • [10] Hongyang Liu, Yitong Yin, and Yiyao Zhang. Work-efficient parallel counting via sampling, 2024. doi:10.48550/arXiv.2408.09719.
  • [11] Sriram V. Pemmaraju and Joshua Z. Sobel. Exact distributed sampling. In Structural Information and Communication Complexity: 30th International Colloquium, SIROCCO 2023, Alcalá de Henares, Spain, June 6–9, 2023, Proceedings, pages 558–575, Berlin, Heidelberg, 2023. Springer-Verlag. doi:10.1007/978-3-031-32733-9_25.
  • [12] L.G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8(2):189–201, 1979. doi:10.1016/0304-3975(79)90044-6.
  • [13] Daniel Štefankovič, Santosh Vempala, and Eric Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. J. ACM, 56(3), May 2009. doi:10.1145/1516512.1516520.
  • [14] Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. New bounds for matrix multiplication: from alpha to omega. In ACM-SIAM Symposium on Discrete Algorithms, 2023. URL: https://api.semanticscholar.org/CorpusID:259937341.