Fast and perfect sampling of subgraphs and polymer systems

We give an efficient perfect sampling algorithm for weighted, connected induced subgraphs (or graphlets) of rooted, bounded degree graphs. Our algorithm utilizes a vertex-percolation process with a carefully chosen rejection filter and works under a percolation subcriticality condition. We show that this condition is optimal in the sense that the task of (approximately) sampling weighted rooted graphlets becomes impossible in finite expected time for infinite graphs and intractable for finite graphs when the condition does not hold. We apply our sampling algorithm as a subroutine to give near linear-time perfect sampling algorithms for polymer models and weighted non-rooted graphlets in finite graphs, two widely studied yet very different problems. This new perfect sampling algorithm for polymer models gives improved sampling algorithms for spin systems at low temperatures on expander graphs and unbalanced bipartite graphs, among other applications.


Introduction
Sampling is a fundamental computational task: given a specification of a probability distribution on a (large) set of combinatorial objects, output a random object with the specified distribution or with a distribution close to the specified distribution. This task becomes challenging when the specification of the distribution is much more succinct than the set of objects, and one wants to sample using time and space commensurate with the specification. Fundamental examples include sampling from Markov random fields and probabilistic graphical models and sampling substructures of graphs. We will address both of these examples here and connect them in a new way.
We consider a natural sampling problem: given a bounded-degree graph G, sample a graphlet (a connected, vertex-induced subgraph) of G containing a fixed vertex r with probability proportional to an exponential in the size of the subgraph. That is, sample a graphlet S containing vertex r with probability proportional to λ |S| , where λ > 0 is a distribution parameter and |S| denotes the number of vertices in S. In this paper we are concerned with small values of λ, where the expected size of a sampled graphlet is much smaller than the size of the graph.
Sampling graphlets is an important task in data science, network analysis, bioinformatics, and sociology, as it allows us to gain information about massive graphs from small sections of it; see, e.g., [38,28,41,4]. A number of variants of the problem have consequently been studied, including sampling graphlets of a given size uniformly at random or sampling weighted graphlets of all sizes [46,5,37,15,10,11,43,1,42,45,12,9]. The variant we consider here, i.e., sample a graphlet S with probability proportional to λ |S| , arises as a key subroutine in recent sampling algorithms for spin systems (hard-core model, Ising model, Potts model, etc.) in the regime of strong interactions via polymer models described below in Section 1.2; see [32,13,40,25,7,36,35,14,17].
One major limitation of previous sampling algorithms for graphlets and polymer models (those in, e.g., [32,40,45,16,24], among others) is the use of exhaustive enumeration of graphlets of a given size; this requires restrictive parameter regimes or large polynomial running times, with the maximum degree ∆ of the graph appearing in the exponent of the polynomial. Here we design a fast perfect sampling algorithm for weighted graphlets based on a vertex percolation process combined with a rejection filter. This method bypasses the enumeration barrier and allows us to design perfect sampling algorithms for a number of applications, substantially improving upon existing algorithms in three ways: 1) our algorithms have considerably faster running times, with no dependence on ∆ in the exponent; 2) our algorithms return perfect, rather than approximate, samples from the desired distributions; and 3) our algorithms are conceptually simple and practical to implement.
Our algorithm proceeds as follows. First, run a vertex percolation process on the graph G beginning at vertex r in a breadth-first search manner, repeatedly adding each adjacent vertex to the graphlet with a carefully-chosen probability p. Once the percolation process terminates, the graphlet is accepted as the random sample with a certain probability that depends on the graphlet and rejected otherwise; if the graphlet is rejected, the algorithm restarts another percolation process from r. Because of the careful way we choose the percolation and rejection probabilities, we can prove the final accepted sample is drawn exactly from the desired distribution and the expected running time is bounded by a constant that depends only on λ and the maximum degree ∆.

Sampling rooted graphlets
Our key contribution is a new algorithm for perfectly sampling weighted graphlets containing a given vertex r. Formally, let G = (V, E) be a finite or infinite graph of maximum degree ∆. For r ∈ V , let S(G, r) be the set of all connected, vertex-induced subgraphs of G containing r. (The subgraph induced by U ⊆ V has vertex set U and includes all the edges of G with both endpoints in U .) We call r the root of G and the elements of S(G, r) graphlets rooted at r. For λ > 0 define the probability distribution ν G,r,λ on S(G, r) by The distribution is well defined when the normalizing constant Z G,r,λ , known as the partition function, is finite. This is the case for every graph of maximum degree ∆ and every r when λ is below the critical threshold: (see Lemma 3 below). We give an efficient perfect sampling algorithm for ν G,r,λ for λ < λ * (∆).
▶ Theorem 1. Fix ∆ ≥ 3 and let λ < λ * (∆). There is a randomized algorithm that for any graph G = (V, E) of maximum degree ∆ and any r ∈ V outputs a graphlet distributed according to ν G,r,λ with expected running time bounded by a constant that depends only on ∆ and λ.
We assume a model of computation that allows for querying the adjacency list of a given vertex in a bounded degree graph in constant time, the standard model used in the study of sublinear algorithms [27]. We also assume access to a stream of perfectly random real numbers in [0, 1]. The model of computation is chosen for consistency; in particular, our methods extend to other models, only requiring to adjust the running time to account for any additional computational overhead. Previous algorithms to generate ε-approximate samples from ν G,r,λ (e.g., those in [32,45,16,24]) exhaustively enumerate all graphlets of size ≤ k, for some k that depends on the error parameter ε that describes how accurate the sample must be. This results in algorithms with (1/ε) O(log ∆) running times. Applications such as sampling from polymer models require multiple samples from ν G,r,λ and, consequently, have small error tolerance per sample; in particular, they require ε ≪ 1/n, which results in inefficient algorithms with overall running time n O(log ∆) . The algorithm in Theorem 1, on the other hand, is an exact sampler whose expected running time depends only on ∆ and λ and thus provides a significant advantage in applications as we detail below.
We also show that Theorem 1 is sharp in two ways. First, we establish that there is no polynomial-time approximate sampling algorithm for ν G,r,λ when λ > λ * (∆) for the class of graphs of maximum degree at most ∆ unless RP=NP. Second, in the infinite setting, the normalizing constant Z G,r,λ may diverge (and consequently the distribution ν G,r,λ is not well-defined) when λ > λ * (∆); conversely, we prove that Z G,r,λ is finite on every graph of maximum degree ∆ when λ ≤ λ * (∆). ▶ Lemma 2. If for every finite graph G = (V, E) of maximum degree ∆ ≥ 3 and every r ∈ V , there is a polynomial-time approximate sampler for ν G,r,λ when λ > λ * (∆), then RP=NP. The proofs of these lemmas are omitted, but can be found in the full version of this paper [6].
Finally, we mention that the algorithmic result in Theorem 1 cannot be extended even to the case λ = λ * (∆): for the infinite ∆-regular tree, we can show that the expected size of a graphlet sampled from ν G,r,λ is infinite when λ = λ * (∆), and so it is impossible to have sampling algorithms with finite expected running time. In summary, the algorithm in Theorem 1 for λ < λ * (∆), combined with the hardness/impossibility results in Lemmas 2 and 3 for λ > λ * (∆), provide a resolution to the computational problem of sampling from ν G,r,λ on graphs of maximum degree at most ∆.
As mentioned, our sampling algorithm is based on exploring the connected component of r in a vertex-percolation process. We carefully choose a specific percolation parameter p ∈ (0, 1) as a function of λ and ∆ (see Lemma 9). We then perform breadth-first search (BFS) from r, labeling each new vertex encountered "active" with probability p and "inactive" with probability 1 − p independently over all vertices; we continue the BFS exploring only unexplored neighbors of active vertices. In this way we uncover the "active" component of r, call it γ. We then accept γ with a given probability depending on λ, ∆, |γ| and |∂γ|, where ∂γ denotes the set of vertices outside of γ that are adjacent to γ. If we reject γ, we begin again with a new percolation process. We note that only when λ < λ * (∆) does there exist a suitable percolation probability p that results in a subcritical percolation process, so that the size of the active component has finite expectation and exponential tails. Sampling a graphlet by exploring a random component and performing a rejection step has been used in the past (most notably in the recent work of Bressan [9] to sample uniformly random graphlets of size k; see also [2]). The weighted model we sample from is particularly well suited to this type of exploration algorithm because of the direct connection to a subcritical percolation process.
We prove a more general version of Theorem 1 in Section 2, allowing for vertex-labeled graphlets and modifications of the weights by multiplication by a non-negative function bounded by 1. These generalizations are needed for the application to polymer models in Section 1.2.

Sampling from polymer models
We use our algorithm for sampling weighted rooted graphlets to design fast and perfect samplers for polymer models. Polymer models are systems of interacting geometric objects representing defects from pure ground states (i.e., most likely configurations) in spin systems on graphs in classical statistical physics [29,39,22]. These geometric objects are most often represented by vertex-labeled graphlets from a given host graph. Recently, polymer models have found application as an algorithmic tool to sample from spin systems on various classes of graphs in strong interaction regimes; see, e.g., [32,13,40,16,31,24,25,7,36,35,14,20,17]. In these applications, the problem of sampling weighted vertex-labeled rooted graphlets emerged as a significant computational barrier.
We will work with subset polymer models in which all polymers are vertex-labeled graphlets from a host graph G = (V, E). These models were defined in [29] and generalized in [39]. Such a polymer model consists of: A set C = C(G) of polymers, each of which is a graphlet in G with the vertices of the graphlet labeled with colors from a set Σ of size q. Weights w γ ≥ 0 for each γ ∈ C. An incompatibility relation ≁ defined by connectivity. We say two polymers γ, γ ′ ∈ C are incompatible and write γ ≁ γ ′ if the union of their corresponding vertices induces a connected subgraph in G. Otherwise they are compatible and we write γ ∼ γ ′ .
Let Ω(C) denote the set of all sets of pairwise compatible polymers from C. The polymer model is the Gibbs distribution µ on Ω(C) defined by is the polymer model partition function. We say the weights of a polymer model are computationally feasible if w γ can be computed in time polynomial in |γ|. The size |γ| of a polymer γ is the number of vertices in the corresponding graphlet. Polymer Sampling [16,24] wγ We will assume without loss of generality that all vertex-labeled graphlets of G, including each individual vertex v ∈ V , are elements of C, by setting w γ = 0 when necessary. We let C v be all polymers containing vertex v.
Algorithms for sampling polymer models fall into two classes: those based on truncating the so-called cluster expansion of a polymer model to approximate a partition function and using self-reducibility to sample, and those based on Markov chains on the set of collections of compatible polymers. The cluster expansion approach, while giving polynomial-time algorithms, generally is relatively inefficient, with the degree of the polynomial bound on the running time growing with the degree of the underlying graph; e.g., running time n O(log ∆) in n-vertex graph of maximum degree ∆. The Markov chain approach in principle can be much faster (near linear time in the size of the graph) but runs into one hitch: a stricter condition on the parameters of the model is needed to perform one update step of the Markov chain (the "polymer sampling condition" in [16,24]). We solve this problem by adapting our rooted graphlet sampler to sample polymers models, leading to a near linear-time perfect sampling algorithm for polymer models under the least restrictive conditions known (see Table 1).

▶ Theorem 4.
Consider a subset polymer model on a family of n-vertex graphs of maximum degree ∆ with computationally feasible weights satisfying: Suppose further that for all vertex v, γ̸ ∼v Then, there is a perfect sampling algorithm for µ with expected running time O(n log n).
The threshold defined in (3) is the generalization of the critical threshold for rooted graphlet sampling to the labeled case (taking q = 1 recovers the definition in (2)). Theorem 4 improves upon the known results for sampling from polymer models in two ways. For a very general class of polymer models, our algorithm simultaneously provides perfect sampling with near-linear running time under weak conditions on the polymer weights. We now review previous works to illustrate these improvements; see the accompanying Table 1.
A number of conditions on polymer weights have been used to provide efficient sampling algorithms. The first papers in this direction (including [32,35,13]) used the Koteckỳ-Preiss condition for convergence of the cluster expansion of the polymer model partition function [39]: This condition is typically verified by ensuring that: Since the number of vertex-labeled rooted graphlets of size k in a maximum degree ∆ graph grows roughly like (eq∆) k−1 (see [8]), weights of polymers of size k must decay roughly like (e 2 q∆) −k for the polymer model to satisfy (5), with the extra factor of e coming from the exponential in the left hand side of the condition (5).
The major downside to the algorithms based on the cluster expansion, i.e., those using (5) or the Koteckỳ-Preiss condition, is that the running times obtained are of the form n O(log ∆) . Subsequent works, namely [16,24], addressed this downside but at the cost of a significantly stricter condition on the polymer weights.
In [16], the authors devised a new Markov chain algorithm for sampling from polymer models. The condition on the polymer weights for rapid mixing of this chain is somewhat less restrictive than the Koteckỳ-Preiss condition; it is the Polymer Mixing condition: This requires weights of polymers of size k to decay like (eq∆) −k , a savings of a factor e in the base of the exponent over (5). However, to implement a single step of this Markov chain in constant expected time, a stronger condition (the Polymer Sampling condition) was required: This is a significant loss of a factor e 3 ∆ 2 q 2 in the base of the exponent compared to (5), but the resulting sampling algorithm does run in near linear time.
In [23], the authors use a different Markov chain condition, the Clique Dynamics condition, similar to (6), which requires weights of polymers of size k to decay like (eq∆) −k , saving the same factor e over (5). Their running times, though, are again of the form n O(log ∆) since implementing one step of their Markov chain involves enumerating rooted polymers of size O(log n).
Our results are a "best-of-both-worlds" for polymer sampling: under the conditions (3) and (4) that both require polymer weights to decay like (eq∆) −k -the precise conditions are similar to but slightly less restrictive than the polymer mixing condition (6) -we obtain a near linear time algorithm. Moreover, unlike any of the previous results, our algorithm is a perfect sampler.
To conclude this section, we comment briefly on the algorithm we design to sample from µ. Our starting point is the polymer dynamics Markov chain from [16]. We use it to implement a Coupling from the Past (CFTP) algorithm (see [44]). To do so efficiently (in terms of the number of steps of the Markov chain), we design a new "bounding Markov chain" for the polymer dynamics, a method pioneered in [33,30], and to implement each step of the Markov chain efficiently, we turn to our sampler for weighted rooted graphlets from Theorem 1.

Applications to spin systems
Our new algorithm for sampling subset polymer models can be used as a subroutine in essentially all previous applications of polymer models for spin system sampling at low temperatures, including those in [35,13,40,16,31,24,25,14,20,17]. This results in faster sampling algorithms under less restrictive conditions on model parameters in all those settings. As examples, we fleshed out here the details in two of these applications; more details are provided in the full version of this paper [6]. Hard-core model on bipartite graphs. The hard-core model on a graph G is the probability distribution µ hc G on I(G), the set of all independent sets of G, with The complexity of approximate counting and sampling from µ hc G on bounded-degree graphs is well understood: there is a computational threshold at some λ c (∆), with efficient algorithms for λ < λ c (∆) [49,3,18,19] and hardness above the threshold (no polynomial-time algorithms unless NP=RP) [47,26,48]. However, on bipartite graphs, the complexity of these problems is unresolved and is captured by the class #BIS (approximately counting independent sets in bipartite graphs) defined by Dyer, Goldberg, Greenhill, and Jerrum [21].
Theorem 4 implies the existence of a fast perfect sampling algorithm for the hard-core model in a certain class of bipartite graphs called unbalanced bipartite graphs, considered in [13,23].
▶ Corollary 5. There is a perfect sampling algorithm for µ hc G running in expected time O(n log n) for n-vertex bipartite graphs G with bipartition (L, R), with maximum degree ∆ L in L, maximum degree ∆ R in R, and minimum degree δ R in R if Approximate sampling algorithms with large polynomial run times were previously given for this problem when 6λ∆ L ∆ R < (1 + λ) δ R /∆ L in [13] and when 3.3353λ∆ L ∆ R < (1 + λ) δ R /∆ L in [23]. Our result applies to a comparable parameter range: inequality (9) holds, for instance, when (1 + e)λ∆ L ∆ R < (1 + λ) δ R /∆ L , or when 3λ∆ L ∆ R < (1 + λ) δ R /∆ L and ∆ L < 6. More importantly, our algorithm is the first to achieve perfect sampling and near-linear running time.  ,σ) , where m(G, σ) is the number of monochromatic edges of G under the coloring σ and β > 0 is a model parameter. When the parameter β is large, and G has some structure (e.g., G is an expander graph), typical configurations drawn from µ potts G are dominated by one of the Q colors; that is, there is phase coexistence in the model. This enables sampling using subset polymer models.
Recall that an n-vertex graph G = (V, E) is an α-expander if for all subsets S ⊆ V with |S| ≤ n/2, the number of edges in E with exactly one endpoint in S is at least α|S|. ▶ Corollary 6. Consider the Q-color ferromagnetic Potts model on an α-expander n-vertex graph of maximum degree ∆. Suppose Then there is a sampling algorithm with expected running time O(n log n) that outputs a sample σ with distributionμ so that ∥μ − µ potts G ∥ tv ≤ e −Ω(n) .
Previously, [16] provided a ε-approximate sample for µ P otts G in time O(n log(n/ε) log(1/ε)) whenever β ≥ 5+3 log((Q−1)∆) α . Condition (10) holds when β ≥ 1.2+log((Q−1)∆) α , so our algorithm applies to a larger range of parameters and removes the dependence on ε from the running time. We do not achieve perfect sampling in this application only because the subset polymer models used give approximations of µ potts G rather than describing µ potts G exactly.

Sampling unrooted graphlets in finite graphs
As another application of our algorithm for sampling weighted rooted graphlets, we consider next the problem of sampling weighted unrooted graphlets in a finite graph. Given a finite graph G, let S(G) be the set of all graphlets of G. Define the distribution ν G,λ on S(G) by Read-McFarland and Štefankovič [45] gave a polynomial-time approximate sampling algorithm for ν G,λ for the class of maximum-degree ∆ graphs when λ < λ * (∆) and prove that there is no such algorithm for λ ∈ (λ * (∆), 1) unless NP=RP 1 . We give a new algorithm for this problem, covering the entire λ < λ * (∆) regime, and improving on the result of [45] in two ways: (i) our running time is constant in expectation (with no dependence on n), while the running time of the ε-approximate sampler in [45] is n · (1/ε) O(log ∆) ; and (ii) our algorithm outputs a perfect sample instead of an approximate one (and thus the running time has no dependence on any approximation parameter).

▶ Theorem 7. Fix ∆ ≥ 3 and let λ < λ * (∆). Then for the class of finite graphs of maximum degree ∆ there is a randomized algorithm running in constant expected time that outputs a perfect sample from ν G,λ . The expected running time is bounded as a function of ∆ and λ.
The algorithm we use for this theorem is a modification of the one for sampling rooted graphlets. We pick a uniformly random v ∈ V , run the same BFS percolation exploration, and accept the connected component of v with an adjusted probability (to account for the fact that a graphlet can be generated from any of its vertices). The acceptance probability is bounded away from 0 and so the algorithm runs in constant expected time. As mentioned earlier, the ε-approximate sampling algorithm from [45] is based on the exhaustive enumeration of all subgraphs of size ≤ k, for some k that depends on ε. Our new algorithm entirely bypasses this enumeration barrier.

Graphlet sampling: algorithms
In this section we present our efficient perfect sampling algorithm for weighted, vertex-labeled graphlets containing a fixed vertex r from a maximum degree ∆ graph; in particular, in Section 2.1, we prove a generalized version of Theorem 1 from the introduction. We also provide in Section 2.2 our algorithm for sampling weighted graphlets (i.e., the unrooted, unlabeled case) and establish Theorem 7.

Sampling rooted vertex-labeled graphlets
denote the corresponding vertex-induced subgraph of G; specifically, G[U ] = (U, E(U )), where E(U ) ⊆ E is the set of edges of G with both endpoints in U . A vertex-induced subgraph is a graphlet if it is connected. For r ∈ V , let S(G, r) be the set of all graphlets of G that contain vertex r. We call the graphlets in S(G, r) the graphlets rooted at r.
Define the probability distribution ν G,r,λ on S(G, r, q) ∪ {∅} by setting where Z(G, r, λ) = γ ′ ∈S(G,r,q)∪{∅} w γ ′ . We assume that G, f , q and λ are such that Z (G, r, λ) is finite, so that this distribution is well defined. When q = 1 and f (γ) = 1(γ ̸ = ∅), ν G,r,λ corresponds exactly to the distribution defined in (1) over the unlabeled graphlets of G rooted at r. We consider the problem of sampling from ν G,r,λ ; this more general version of the sampling problem is later used as a subroutine for sampling polymer systems in Section 3. Let cf., (2). Our main algorithmic result for sampling colored rooted graphlets is the following.
▶ Theorem 8. Suppose ∆ ≥ 3, λ > 0, and q ≥ 1 are such that λ < λ * (∆, q) and let a > 0 be a fixed constant. There is a randomized algorithm to exactly sample from ν G,r,λ for graphs G of maximum degree ∆ and functions f : ; this randomized algorithm has expected running time bounded by C · Z −1 G,r,λ , where C > 0 is a constant that depends only on q, λ, ∆ and a.
Theorem 1 from the introduction corresponds to the special case when q = 1 and f (γ) = 1(γ ̸ = ∅) (in this case Z G,r,λ ≥ λ). Other mild assumptions on the function f , e.g., f (∅) = 1 or f (r) = 1, ensure that Z G,r,λ is bounded away from 0 and, consequently, that the sampling algorithm in the theorem has constant expected running time.
As a warm-up, let us consider first our algorithm for sampling labeled rooted graphlets on a finite graph G = (V, E) with f = 1, and purposely omit certain non-essential implementation details for clarity. First, we find p ∈ (0, 1) such that p q (1 − p) ∆−2 = λ; this choice of p will be justified in what follows. The algorithm then repeats the following process until a vertex-labeled graphlet is accepted: 1. Each vertex of the graph is independently assigned with probability p a uniform random color from {1, . . . , q}, or it is marked as "not colored" with the probability 1 − p. 2. Let γ be the vertex-labeled graphlet from S(G, r, q) ∪ {∅} corresponding the colored connected component of r; i.e., the set of vertices connected to r by at least one path of colored vertices.

3.
Observe that the probability of obtaining γ is (p/q) |γ| (1 − p) |∂γ| , where ∂γ denotes to set of vertices in G that are not in γ but are adjacent to a vertex in γ (with a slight abuse of notation, we let |∂∅| = 1). Our aim is to output γ with probability ∝ λ |γ| which has no dependence on ∂γ. Therefore, we use a "rejection filter" and only accept γ with probability (1 − p) (∆−2)|γ|+2−|∂γ| , so that the probability that γ is the output becomes: From (12), the choice of p such that p q (1 − p) ∆−2 = λ is apparent. We will prove that only when λ < λ * (∆, q) there exists p ∈ (0, 1) such that p q (1 − p) ∆−2 = λ. In the actual implementation of the algorithm, it will in fact suffice to find an approximation for p.

Fast and Perfect Sampling of Subgraphs and Polymer Systems
We comment briefly on the intuition for the rejection filter. The acceptance probability must include a factor of (1 − p) −|∂γ| , so that the final acceptance probability depends on |γ| but not on |∂γ|. However, (1 − p) −|∂γ| > 1 is not a valid probability, so we use instead (1 − p) (∆−2)|γ|+2−|∂γ| , which is at most 1 since (∆ − 2)|γ| + 2 ≥ |∂γ|. This bound on |∂γ| is best possible since it is tight for the ∆-regular tree. We note that using looser bounds for |∂γ| affects the range of the parameter λ for which we can find p ∈ (0, 1) so that p q (1 − p) ∆−2 = λ. Finally, we mention that the algorithm as described requires Ω(|V |) time per iteration and cannot be extended to infinite graphs. This is easily corrected by assigning colors starting from r and revealing only the colored component of r in a breadth-first fashion. The threshold λ * (∆, q) is sharp in the sense that only when λ < λ * (∆, q) is the value of p such that the revealing process is a sub-critical process that creates a small component with high probability. This ensures the algorithm can be implemented efficiently. In particular, we stress that our algorithm avoids exhaustively enumerating labeled graphlets, as done in previous methods [16].
Before giving the implementation details of our algorithm and proving Theorem 8, we consider the problem of finding p ∈ (0, 1) such that p q (1 − p) ∆−2 = λ. For ∆ ≥ 3 and q ≥ 1, consider the real function g(x) = x q (1 − x) ∆−2 . It can be readily checked that the function g is continuous and differentiable in [0, 1], has a unique maximum at , and decreasing in [ 1 ∆−1 , 1]. This implies that only when λ < λ * (∆, q), there exists a value of p ∈ [0, 1 ∆−1 ) such that g(p) = λ. In particular, when λ > λ * (∆, q), there is no value of p for which g(p) = λ and when λ = λ * (∆, q), the only possible value is p = 1 ∆−1 . The latter case would result in a critical percolation process, corresponding to the fact that the expected size of a graphlet from ν G,r,λ has no uniform upper bound in the class of graphs of maximum degree ∆; in fact, it is infinite on the ∆-regular tree. We can find a suitable approximation for p when λ < λ * (∆, q) via a simple (binary search) procedure.
The proof of this lemma appears after the proof of Theorem 8. We now prove Theorem 8, including giving a more detailed version of the algorithm outlined above that includes the previously omitted implementation details and allows for general functions f : S(G, r, q) ∪ {∅} → [0, 1].
Proof of Theorem 8. For ease of notation, let λ * = λ * (∆, q). Our algorithm to sample from ν G,r,λ when λ < λ * explores from r in a breadth-first manner and stops once it has revealed the colored connected component of r. It proceeds as follows: Mark w as explored (regardless of whether it was added to Q or not).

4.
Let γ be the vertex-labeled graphlet from S(G, r, q) ∪ {∅} corresponding the colored connected component of r. Accept γ with probability:

5.
If γ is rejected, go to Step 2 and repeat. The probability of obtaining γ ∈ S(G, r, q) ∪ {∅} in an iteration of the algorithm is: and thus the overall acceptance probability in an iteration is: Then, We next bound the expected running time of the algorithm. We claim first that expected running per iteration is at most a constant that depends only on a, ∆ and q. If γ is the configuration generated in an iteration, it is discovered in O(|γ| + |∂γ|) = O(|γ|) time and, by assumption, f (γ) can be computed in at most O(|γ| a ) time, for suitable a constant a > 0. Letμ the output distribution of Step 3 of the algorithm. Then, there exists a constant C = C(q, ∆) > 0 such that the expected running time of each iteration is at most: We show next that |γ| (underμ) is stochastically dominated by a random variable W = X + Y (i.e., |γ| ≺ W ), where X and Y are i.i.d. random variables corresponding to the cluster size of a homogeneous Galton-Watson tree with offspring distribution Bin(∆ − 1,p). To see this, first note that |γ| ≺ L, where L is the cluster size of a heterogeneous Galton-Watson tree, in which the root vertex has offspring distribution Bin(∆,p) and every other vertex has offspring distribution Bin(∆ − 1,p). This is because the branching process generating γ includes the root only with probabilityp (the root is always present in the Galton-Watson tree), and, in addition, it considers at most ∆ (from the root) or ∆ − 1 (from any other vertex) potential branches. In turn, we have that L ≺ X + Y , since we can couple the first ∆ − 1 branches of the root with X (starting at the root) and the remaining branch with Y (starting at the child of the root not coupled with X).
It is well-known that X and Y have finite moments when (∆ − 1)p < 1 (see, e.g., [34]). In particular, there exists a constant A = A(a, ∆,p) > 0 such that This together with (13) shows that the expected running time in each iteration of the algorithm is bounded by C · A. Now, let R be the number of times Steps 2-5 are repeated, let T be the overall the running time of the algorithm. Then: and the result follows. ◀ We conclude this section with the proof of Lemma 9.

Sampling unrooted graphlets
We consider next the problem of sampling weighted graphlets from a finite graph G = (V, E) of maximum degree ∆; specifically, in this variant of the sampling problem we consider unrooted, unlabeled, weighted graphlets of G. Let S(G) be the set of all graphlets of G. We define the probability distribution ν G,λ on S(G) by setting The problem of (approximately) sampling from ν G,λ is quite natural. In [45], it was established that this problem is computationally hard when λ > λ * (∆) = (∆−2) ∆−2 (∆−1) ∆−1 ; an ε-approximate sampling algorithm was also given in [45] for the case when λ < λ * (∆) with running time n · (1/ε) O(log ∆) . We now provide the proof of Theorem 7, which says we can perfectly sample from ν G,λ in constant expected time when λ < λ * (∆).
Proof of Theorem 7. For ease of notation, we set λ * = λ * (∆) throughout this proof. Our algorithm to sample from ν G,λ is based on the algorithm to sample from ν G,r,λ (the rooted, vertex-labeled, weighted case). The idea is to pick a root uniformly at random and run the algorithm for the rooted case from this random vertex with the rejection filter adjusted to account for the fact that a graphlet can be generated from any of its vertices. It proceeds as follows: 1. Findp ∈ [0, 1 ∆−1 ) andλ ∈ [λ, λ * ) such that g(p) =λ using the method from Lemma 9. 2. Pick a vertex r ∈ V uniformly at random. 3. Let Q be a queue. With probabilityp add r to Q and mark it as colored. Mark r as explored. 4. While Q ̸ = ∅, repeat the following: Accept S with probability: 6. If S is rejected, go back to Step 2 and repeat.
The analysis of this algorithm is similar to that in the proof of Theorem 8. Let n = |V |. The probability that the algorithm outputs S in an iteration is: Hence, conditioned on acceptance, the probability of obtaining S ∈ S(G) is thus ν G,λ (S), and so the output distribution of the algorithm is ν G,λ .
For the running time of the algorithm, we note that Step 4 of the algorithm is analogous to Step 3 of the algorithm in the proof of Theorem 8, and so the expected running time of each round is also bounded by a constant C = C(∆,p) > 0. Let T be the overall the running time of the algorithm. From (16), we have that the overall acceptance probability in a round is ρ = (1−p) 2 Z(G,λ) n . Then, as in (15), we deduce that (1). ◀

Applications to Polymer Models
In this section, we show how to use our algorithm for sampling rooted vertex-labeled graphlets from Section 2 to sample from subset polymer models and prove Theorem 4.
Consider a subset polymer model on an n-vertex graph G = (V, E); see Section 1.2 for the definition. Recall that we use C v for the set of all polymers containing vertex v ∈ V , and let γ ∅ denote the empty polymer. Define the distribution where we assign w γ ∅ = 1. The following Markov chain on Ω(C), introduced in [16], has stationary distribution µ and mixes rapidly in O(n log n) steps under the polymer mixing condition (6).
Pick v ∈ V uniformly at random and let S v = {γ ∈ X t : v ∈ γ} (note that S v is either empty or contains 1 polymer).
To implement a single update step, one must sample from ν v in Step 3. To do so efficiently, in [16] the much stricter polymer sampling condition (7) was required. Here we give a fast perfect sampler for ν v under a much weaker condition. ▶ Theorem 10. Consider a subset polymer model on a family of n-vertex graphs of maximum degree ∆ ≥ 3 with computationally feasible weights that satisfy w γ ≤ λ |γ| for some λ < λ * (∆, q). There is a randomized algorithm to sample perfectly from ν v for any v ∈ V with expected running time bounded by a function of λ, ∆, and q.
Proof. This follows from Theorem 8. ◀ Using this theorem and the fast mixing result for the polymer dynamics of [16], one can approximately sample from µ whenever both the polymer mixing condition (6) and the assumptions of Theorem 10 hold. We further improve this by giving a perfect sampling algorithm that works whenever a new condition (4) is satisfied (our algorithm also requires the assumptions in Theorem 10). In all known examples, condition (4) is more permissive than the polymer mixing condition (6) from [16].

Perfect Sampling for polymer systems: Proof of Theorem 4
As mentioned, the polymer dynamics from [16] is not a perfect sampler. We propose here a different algorithm to output a perfect sample from µ. Our algorithm uses the coupling from the past method [44] and the notion of bounding Markov chains [33,30] to efficiently implement it.
This implementation of the coupling from the past algorithm provides a perfect sample from µ; see [44]. It remains for us to show that it can be efficiently implemented. For this, we show first that the expected number of steps of the polymer dynamics bounding chain throughout the execution of the algorithm is O(n log n). Afterwards, we will show how to implement steps so that they can be executed in amortized constant expected time.
▶ Lemma 11. Suppose the subset polymer model satisfies condition (4). Then, the expected number of steps of the polymer dynamics bounding chain in the coupling from past algorithm is O(n log n).
Proof. See the full version [6], where a potential ϕ t describing the size of D t is introduced and shown to decrease quickly. ◀ It remains to consider how to efficiently implement the steps of the polymer dynamics bounding chain. This is subtle because D t may initially contain an exponentially large number of polymers, and care is thus needed in how D t is represented and stored. The following lemma says we can represent and update B t and D t efficiently. Proof. See the full version [6]. ◀