Testing versus estimation of graph properties, revisited

A distance estimator for a graph property $\mathcal{P}$ is an algorithm that given $G$ and $\alpha, \varepsilon>0$ distinguishes between the case that $G$ is $(\alpha-\varepsilon)$-close to $\mathcal{P}$ and the case that $G$ is $\alpha$-far from $\mathcal{P}$ (in edit distance). We say that $\mathcal{P}$ is estimable if it has a distance estimator whose query complexity depends only on $\varepsilon$. Every estimable property is also testable, since testing corresponds to estimating with $\alpha=\varepsilon$. A central result in the area of property testing, the Fischer--Newman theorem, gives an inverse statement: every testable property is in fact estimable. The proof of Fischer and Newman was highly ineffective, since it incurred a tower-type loss when transforming a testing algorithm for $\mathcal{P}$ into a distance estimator. This raised the natural problem, studied recently by Fiat--Ron and by Hoppen--Kohayakawa--Lang--Lefmann--Stagni, whether one can find a transformation with a polynomial loss. We obtain the following results. 1. If $\mathcal{P}$ is hereditary, then one can turn a tester for $\mathcal{P}$ into a distance estimator with an exponential loss. This is an exponential improvement over the result of Hoppen et. al., who obtained a transformation with a double exponential loss. 2. For every $\mathcal{P}$, one can turn a testing algorithm for $\mathcal{P}$ into a distance estimator with a double exponential loss. This improves over the transformation of Fischer--Newman that incurred a tower-type loss. Our main conceptual contribution in this work is that we manage to turn the approach of Fischer--Newman, which was inherently ineffective, into an efficient one. On the technical level, our main contribution is in establishing certain properties of Frieze--Kannan Weak Regular partitions that are of independent interest.


Background on graph property testing
Property testers are fast randomized algorithms that can distinguish between objects satisfying some predetermined property P and those that are ε-far from satisfying P. In most cases, ε-far means that an ε-proportion of the object's representation needs to be changed in order

46:2
Testing Versus Estimation of Graph Properties, Revisited to obtain a new object satisfying P. Hence, testing for P is a relaxed version of the classical decision problem which asks to decide whether an object satisfies P. In this paper we study properties of graphs in the so called adjacency matrix model (which is also sometimes referred to as the dense graph model).This is arguably one of the most well studied models in the area of property testing.The reader is referred to [20] for more background and references on property testing.
We now introduce the model of testing graph properties in the adjacency matrix model.A graph property P is a family of graphs closed under isomorphism.A graph G on n vertices is ε-far from P if one should add/delete at least εn2 edges to turn G into a graph satisfying P. If G is not ε-far from P then it is ε-close to P. A tester for P is a randomized algorithm that given ε > 0 distinguishes with high probability (say, 2/3) between graphs satisfying P and those that are ε-far from P. We assume the algorithm can query for each 1 ≤ i, j ≤ n whether the input G contains the edge (i, j).The edge query complexity, denoted Q(ε), of a tester is the number of edge queries it performs.If P has a tester whose edge query complexity depends only on ε (and is independent of n) then P is called testable.In what follows we will mainly work with vertex query complexity which is the smallest q = q(ε) so that we can ε-test P by inspecting a subgraph of the input graph G, induced by a set of q randomly selected vertices.By a theorem of Goldreich and Trevisan [22] we know that q(ε) ≤ 2Q(ε) ≤ q 2 (ε).In most (but not all) discussions below we will not care much about these quadratic factors.In such cases we might use the term query complexity without mentioning if this is vertex or edge query complexity.
Property testing in the adjacency matrix model was first introduced by Goldreich, Goldwasser and Ron [21], who proved that every partition property (e.g.k-colorability and MAX-CUT) is testable.There are several general results guaranteeing that a graph property is testable [3,10].A result of this nature was obtained by Alon and Shapira [5] who proved that every hereditary1 graph property is testable.Their proof applied Szemerédi's regularity lemma [35] (see also [33]), which is one of the most useful tools when studying properties of dense graphs.Using this tool comes with a hefty price, since the bounds one obtains when using the regularity lemma are of tower-type 2 .
One of the central open (meta) problems related to testing graph properties is when can one turn an ineffective (e.g. one with tower-type bounds) result into an efficient one, preferably with polynomial bounds.While this is a quantitative question, what lies beneath it is in fact the following qualitative problem; when can we prove a testability result while avoiding Szemerédi's regularity lemma, either by giving a direct combinatorial argument or by using a weaker variant of the regularity lemma (e.g. the Frieze-Kannan regularity lemma [18] which we discuss below).For example, Rödl and Duke [31] used the regularity lemma in order to (implicitly) prove that k-colorability is testable.The tower-type bounds obtained in [31] were improved to polynomial in [21] using a direct argument which avoided the use of the regularity lemma.A specific central open problem, due to Alon and Fox [4], concerns hereditary properties, and asks which hereditary properties are testable with query complexity poly(1/ε).A systematic investigation of this problem was carried out in [19].

Distance estimation
In the dense graph model we say that a graph's distance from P is α, if α is the smallest real so that G is α-close to P. In other words, this is the minimum number of edges one should add/delete in order to obtain a graph satisfying P, normalised by n 2 .We denote this quantity by dist P (G).A distance estimator for P is a randomized algorithm that given α, ε > 0 distinguishes with high probability (say, 2/3) between graphs that are (α − ε)-close to P and those that are α-far from P. If for every α, ε there is a distance estimator for P whose query complexity depends only on ε, then P is said to be estimable.Note that testing P is equivalent to distance estimation with α = ε, hence this notion is at least as strong as testability.
Distance estimation was first studied in [30] and has since been studied in various other settings such as distributions [7], strings [6], sparse graphs [11,13,28], boolean functions [1,9], error correcting codes [23,26] and image processing [8].It is known that in certain settings, there are testable properties which are not estimable [15].One of the central and most unexpected results in the area of graph property testing is the Fischer-Newman theorem [16], which states that in the setting of graphs, every testable property is also estimable.As with several of the main results in this area, the proof in [16] relied on Szemerédi's regularity lemma [35] and thus resulted in a tower-type loss when transforming a tester for P into a distance estimator for P. Returning to the discussion in the last paragraph of the previous subsection, it is natural to ask if one can improve the transformation of [16] and turn a tester for P into a distance estimator with a polynomial loss.

New results concerning hereditary graph properties
As we mentioned in the previous subsection, the family of hereditary graph properties has been extensively studied within the setting of graph property testing.The fact that every hereditary property is testable follows from the following statement, where we use ind(F, G) to denote the probability that a random mapping φ : V (F ) → V (G) is an injective induced homomorphism. 3 Lemma 1 (Induced Removal Lemma, [5]).For every ε > 0 and every hereditary P, there exists M = M 1 (ε, P), δ = δ 1 (ε, P) > 0 and n 0 = n 1 (ε, P) such that if a graph G on n ≥ n 0 vertices is ε-far from P then there is a graph The first version of the above lemma was obtained by Alon, Fischer, Krivelevich and Szegedy [2] who proved it when P can be characterized using a finite number of forbidden induced subgraphs.The lemma was proved in full generality by Alon and Shapira [5].Alternative proofs were later obtained by Lovász and Szegedy [27], Conlon and Fox [12] and Borgs et al. [10].It was also extended to the setting of hypergraphs by Rödl and Schacht [32].
Note that it follows immediately from Lemma 1 that every hereditary property is testable with vertex query complexity q(ε) = max{n 0 , M/δ} . (1) Indeed, the algorithm samples a set X of q vertices, queries about all pairs within X, and then accepts if and only if the graph on X satisfies P. If G satisfies P then the algorithm clearly answers correctly (with probability 1).If G is ε-far from P, then by Lemma 1 a random M -tuple of vertices spans an induced copy of a graph F ̸ ∈ P with probability at least δ.Hence, a sample of size M/δ contains an induced copy of F with probability at least 2/3, thus guaranteeing that the sample of vertices does not satisfy P (since P is hereditary).
Recall that [22] proved that if P is testable, then it is testable using an algorithm as above.
Hence, the bounds in Lemma 1 more or less determine the query complexity of testing a hereditary P.This raises the following natural problem, introduced by Hoppen et al. [25,24] and by Fiat and Ron [14], asking if it is possible to estimate every hereditary P with (roughly) the same query complexity with which it can be tested as in (1).
▶ Remark 3.There are hereditary graph properties (e.g.triangle-freeness) for which the best known bounds for M and δ in Lemma 1 are of tower-type.One can argue that in such cases there is little difference between the tower(M/δ) bounds given by [16] and those suggested by Problem 2. However, we should emphasize that for many of these properties (e.g.triangle-freeness) the tower-type bounds are not known to be tight (indeed, the best known lower bounds are just slightly super polynomial).Perhaps more importantly, there are numerous hereditary graph properties for which it is known that both M and δ in Lemma 1 are polynomial in ε (e.g.k-colorability, being an interval graph or being a line graph; see the detailed discussion in [19]).For all these properties, Problem 2 suggests a poly(1/ε) bound, versus the tower(1/ε) bound given by [16].
Problem 2 was studied by Hoppen et al. [25,24].Their main result was that every hereditary P is estimable with query complexity 2 poly((1/δ) M 2 ,log n0) .Our first main result is the following exponential improvement of this result, making a significant step towards resolving Problem 2.
In almost all cases, results concerning testing of dense graphs rely on combinatorial statements which imply trivial algorithms.For example, the algorithm for testing a hereditary property P is trivial once we have Lemma 1 at our disposal.In sharp contrast, many estimation results involve sampling a set of vertices and then carrying out a highly non-trivial computation over this sample.This is certainly the case in the present paper, see the proofs of Lemmas 14 and 15.However, thanks to a well known sampling trick [21], one can transfer any estimation result into a combinatorial statement.For example, this trick gives the following corollary of Theorem 4. ▶ Corollary 6. Set q = 2 poly(M/δ,log n0) as in Theorem 4. Then where the probability is over randomly selected subsets X of q vertices from G, and G[X] is the graph induced by G on X.
It is interesting to note that with Corollary 6 at hand, we can now go back and reprove Theorem 4 using the "trivial/natural" algorithm which samples a set of q vertices X, computes dist P (G[X]), and then states that G is (α − ε)-close to P if dist P (G[X]) ≤ α − ε/2 and is otherwise α-far from P.
Our proof of Theorem 4 actually gives the bound 2 poly(M/εδ,log n0) .One can speculate that poly(M/εδ) = poly(M/δ) since in all known cases δ is at best polynomial in ε, and in many cases much smaller.In order to formally be able to remove the dependence on ε from our bound, we prove the following proposition, where P is trivial if either P contains all graphs or if it contains finitely many graphs.The proof of this proposition relies on a subtle application of Ramsey's theorem.▶ Proposition 7. The following holds for every non-trivial hereditary property P. If q(ε) denotes the vertex query complexity of P then for every small enough ε, we have where M = M 1 (ε, P) and δ = δ 1 (ε, P) are the constants of Lemma 1.
The left inequality above follows from (1).Observe that the lower bound on q(ε) is best possible since it is tight when P is the property of having no edges (in which case q(ε) = O(1/ε)).The proof of the proposition will appear in the journal version of the paper.
It is of course natural to study Problem 2 also for specific hereditary properties.A natural problem of this type is whether every hereditary P that is testable with query complexity poly(1/ε) is also estimable with query complexity poly(1/ε).Such an investigation was initiated recently by Fiat and Ron [14] who proved such a statement for many natural hereditary properties such as Chordality and not containing an induced path on 4 vertices.

New results concerning general graph properties
Given the discussion above, the following problem seems natural.
Prior to this work, the only result concerning general graph properties P was the transformation of Fischer and Newman [16] which turns a testing algorithm for a graph property P with query complexity q(ε) into a distance estimator with query complexity tower(q(ε/2)).Using the tools we develop in order to obtain Theorem 4, we also obtain the following improved bound.▶ Theorem 9.If P is testable with query complexity q(ε) then it is estimable with query complexity 2 poly(1/ε)•2 q(ε/2) .
We would like to argue at this point that since any "natural" property satisfies q(ε) ≥ log(1/ε) the above bound can be written as exp(exp(poly(q(ε/2)))).In order to formally make such a claim, we prove the following variant of Proposition 7, in which P is unnatural if there is ε 0 so that the following holds for every 0 < ε < ε 0 and n ≥ n 0 (ε): either every n-vertex graphs is ε-close to P, or every n-vertex graph does not belong to P. If P is not unnatural then it is (naturally) natural.▶ Proposition 10.Let P be a natural property and let q(ε) be its vertex query complexity, and Q(ε) be its edge query complexity.Then In particular, q(ε) = Ω( 1/ε).

Testing Versus Estimation of Graph Properties, Revisited
The "in particular" part above follows directly from the Goldreich-Trevisan [22] theorem mentioned earlier.Observe that the general lower bound given in ( 3) is best possible since it is tight when P is the property of having no edges, where Q(ε) = O(1/ε).The proof of the proposition will appear in the journal version of the paper.

Summary of previous approaches
The main reason why Szemerédi's regularity lemma is so useful when studying testing/estimation problems is that an ε-regular partition of a graph G determines (approximately) the values of ind(F, G) for all small F .Hence, on a very high level, the way one can estimate a graph's distance to a hereditary property P is to take a single ε-regular partition of G (one such exists by the regularity lemma) and then try to modify this partition using the smallest possible number of edge modifications, so that the new partition "predicts" that there are no induced copies of graphs F ̸ ∈ P in the new graph G ′ .A key "continuity" feature one has to use at this stage is that if G has a regular partition with certain edge densities between the clusters of the partition, and one would like to modify G so that in the new graph G ′ one has a regular partition where the edge densities between the clusters will change on average by γ, then one can achieve this by modifying (γ + o( 1))n 2 edges of G. Fischer and Newman [16] critically relied on the fact that regular partitions in the sense of Szemerédi have this continuity property.The approach of [16] was ineffective since although a regular partition has constant size (i.e., depending only on ε), this constant has tower-type dependence on ε.We should point that one of the key novel ideas of [16] was a method for obtaining the densities of a single Szemerédi partition of the input G.
The way Hoppen et al. [25,24] managed to improve upon [16] (for hereditary P) was by first observing that in order to estimate ind(F, G) for all small F , one does not need the full power of Szemerédi's regularity lemma.Instead, one can use the weak regularity lemma of Frieze and Kannan [17] which involves constants that are only exponential in ε.The main reason why their proof gave a doubly exponential bound is that Frieze-Kannan regular partitions do not (seem to) have the same continuity feature we mentioned in the previous paragraph with respect to Szemerédi partitions.To overcome this, Hoppen et al. [25,24] introduced a sophisticated method that somehow combines working with Frieze-Kannan regular partitions in some parts of the proof, together with vertex partitions that have no regularity 4 features at all (these are sometimes called GGR partitions, after [21]) in other parts of the proof.

Our main technical contribution
Our main technical contribution in this paper establishes that Frieze-Kannan weak regular partitions "almost" satisfy the same continuity feature we mentioned above with respect to Szemerédi partitions.What we show is that one can indeed efficiently modify a Frieze-Kannan partition if one starts with a partition with guarantees slightly stronger than those of Frieze-Kannan, and one is content with ending with a usual Frieze-Kannan partition.See Lemma 28 for the precise statement, whose proof relies on a randomized-roundingtype argument.With the above continuity feature at hand, we can now go back to the Fischer-Newman approach and turn it into an effective one, by taking full advantage of the Frieze-Kannan lemma.One additional hurdle we need to overcome in order to make sure we only incur an exponential loss in our proof, is a method for finding a Frieze-Kannan partition of a graph using a constant number of queries.Here we introduce a variant of the method of Fischer-Newman tailored for Frieze-Kannan partitions, see Lemma 14.The main tools we develop for proving Theorem 4 turn out to be also applicable for proving Theorem 9.The reason why in Theorem 9 we have a double exponential loss is that it is not enough to estimate ind(F, G) for a single F (as in Theorem 4 thanks to Lemma 1) but we instead need to control ind(F, G) for all graphs F of order q(ε).We expect Lemmas 14 and 28 to be applicable in future studies related to efficient testing and estimation of graph properties.

Paper overview
In Section 2 we introduce the two main lemmas in the paper, and show how they imply Theorem 4. These lemmas are proved in Sections 3 and 4. In Section 5 we prove Theorem 9. We prove Proposition 7 at the end of Section 2 and Proposition 10 at the end of Section 5. We use a = poly(x) to denote the fact that a is bounded from above (or below, when 0 < x < 1) by x d for some fixed d, which is independent of n or ε.Also, when we say that "for every a = poly(x) there is b = poly(x)" we mean that for every d there is

2
The Key Lemmas and Proof of Theorem 4 Our goal in this section is to state Lemmas 14 and 15 and then use them to derive Theorem 4.
We prove these lemmas in Sections 3 and 4. At the end of this section we also prove Proposition 7.
To state Lemmas 14 and 15 we need some definitions.We first recall that given a graph G = (V, E), an equipartition Given a graph G and subsets X, Y ⊆ V (G), we use e(X, Y ) to denote the number of edges between X and Y , and d(X, Y ) = e(X, Y )/|X||Y | to denote the density between them.

▶ Definition 11 (Signature). For an equipartition
2 of the pairs i < j.A (γ, γ)-signature is referred to as γ-signature.▶ Definition 12 (Index of a partition).For an equipartition A of a graph V (G) into t sets, we define the index of A to be ▶ Definition 13 (Final partition).For a function f : N → N and γ > 0, we say that an equipartition A of G consisting of t sets is (f, γ)-final if there exists no equipartition B of V (G) with at least t and up to f (t) sets for which ind(B) ≥ ind(A) + γ .
The above notion of a final partition is useful since (as we show later) every graph has such a partition and furthermore, we can design an algorithm for finding a signature of one such partition of an input G.The first key lemma leading to the proof of Theorem 4 does exactly that.
and such that the following holds.If G is a graph on at least N vertices then there is an algorithm making at most q queries to G, computing with probability at least 2  3 a γ-signature of an (f ζ , γ)-final partition of G into at least k and at most T sets.
We prove the above lemma is Section 3. The following is the second key lemma, which we prove in Section 4. In its statement we use the notion ind(F, G) which we defined before the statement of Lemma 1.What it roughly states, is that having a signature of G (with good parameters) is enough for estimating G's distance to satisfying P.
▶ Lemma 15.For every h, ε, δ > 0, there are γ = γ 15 (h, ε, δ), s = s 15 (h, ε, δ) and f and the following holds.For every family H of graphs, each on at most h vertices, there exists a deterministic algorithm, that receives as an input a γ-signature S of an (f 15 , γ)-final partition A into t ≥ s sets of a graph G with n ≥ N 15 (h, ε, δ, t) = poly(t) • 2 poly(h/εδ) vertices, and distinguishes given any α between the following two cases: Proof (of Theorem 4).Suppose P is a hereditary graph property, and let α, ε > 0. Lemma 1 with inputs ε/2 and P asserts that there are so that if a graph G on at least n 0 vertices is ε/2-far from P, then ind(H, G) ≥ δ for some H / ∈ P with |V (H)| ≤ h.We need to describe an algorithm making 2 poly(h/δ,log n0) queries to G and distinguishes with probability at least 2/3 between the case that G is (α − ε)-close to P and the case that G is α-far from P. Set Finally, set ζ = δε/h and observe that Also, note that by Proposition 7 we have poly(1/ζ) = poly(h/δ).Let q, N, T be the parameters given by Lemma 14 when applied with k = s, and ζ, γ, f defined above.(note that γ and f satisfy the assumptions of the lemma).Lemma 14 then guarantees that q, N, T ≤ 2 poly(1/ζ) ≤ 2 poly(h/δ) .
If G has less than N vertices then we can just ask about all the edges of G and answer correctly with probability 1.The number of queries is then at most N 2 ≤ 2 poly(h/δ) as needed.If G has more than N vertices then we can use the algorithm of Lemma 14 with the parameters k, ζ, γ, f defined above.The algorithm makes at most q ≤ 2 poly(h/δ) queries and with probability at least 2/3 returns a γ-signature S of an equipartition of G into s ≤ t ≤ T sets that is (f, γ)-final.Let Again, if G has less than N 1 = max{N ′ , n 0 } vertices then we can just ask about all the edges of G and answer correctly with probability 1.The number of queries is then at most (N 1 ) 2 ≤ 2 poly(h/δ,log n0) as needed.
Suppose then that G has at least max{N, N 1 } vertices.Let H be the family of graph on at most h vertices which do not satisfy P. Then we can now run the algorithm of Lemma 15 on the signature S, with respect to H, with α ′ = α − ε/2 and with ε/2 instead of ε (note that we chose the parameters with ε/2).If the algorithm says that case (i) holds (namely that G is (α ′ − ε/2)-close to some G ′ with ind(H, G ′ ) = 0 for every H ∈ H) then we declare that G is (α − ε)-close to P, and if the algorithm says that case (ii) holds (namely that G is α ′ -far from every G ′ with ind(H, G ′ ) < δ for every H ∈ H) then we declare that G is α-far from P.
Let us prove the correctness of the above algorithm the algorithm will say that case (i) holds, hence the algorithm answers correctly in this case.Suppose now that G is α-far from P. Then any G ′ that is α ′ -close to G must be ε/2-far from P. Hence, by Lemma 1 in any such G ′ we have ind(H, G ′ ) ≥ δ for at least one H ∈ H.We conclude that G is α ′ -far from every G ′ satisfying ind(H, G ′ ) < δ for every H ∈ H. Hence, the algorithm of Lemma 15 will say that case (ii) holds , so our algorithm will answer correctly in this case as well.◀

Proof of Lemma 14
The proof is similar to one in [16].What they have shown is that for every f, γ, one can find an (f, γ)-final partition with a constant, albeit huge tower-type, query complexity.What we do here is show that for restricted types of f , one can get a much better bound.To do this we also need to rely on a recent result of [34].

Preliminary lemmas
In this subsection we describe some preliminary lemmas that will be used in the next subsection in which we prove Lemma 14.We will need the following Chernoff-type large deviation inequality.
▶ Lemma 16.Suppose X 1 , . . ., X m are m independent Boolean random variables, so that for every ▶ Definition 17 (Partition Properties).A partition property is a triple π = (s, ℓ, u) where s is an integer (the size of the partition property), ℓ is a vector of s 2 reals 0 ≤ α i,j ≤ 1 for each 1 ≤ i < j ≤ s, and u is a vector of s 2 reals 0 ≤ β i,j ≤ 1 for each 1 ≤ i < j ≤ s.We say that a graph G satisfies π if there is an equipartition {V Given s and µ we use π(s, µ) to denote the family of partition properties π of size s in which every α i,j and β i,j is an integer multiple of µ (so π(s, µ) contains {0, µ, 2µ, . . ., 1} 2( s 2 ) partition properties).Finally, define Π(t, µ) = s≤t π(s, µ).
Note that each π as above is one of the partition properties studied in [21], where it was shown that they are µ-testable with query complexity (1/µ) poly(s) .This was improved recently to poly(s/µ) in [34].The next lemma states that with (roughly) the same query complexity we can in fact simultaneously test all properties in Π(t, µ).
The proof of the next lemma will appear in the journal version of the paper.
▶ Lemma 18.For every t and µ > 0 there is q = q 18 (t, µ) = poly(t/µ) satisfying the following.There is a randomized algorithm, that given a graph G, makes q queries to G and with probability at least 2/3, for every π ∈ Π(t, µ), distinguishes between the case that G satisfies π and the case that G µ-far from π.

Proof (of Lemma 14):
Given k, ζ, γ and f ζ as in the statement of the lemma, we define T 0 = k and for i ≥ 1 define . Now set the following parameters. and We now describe the algorithm for finding a signature S satisfying the requirement of the lemma.For what follows let π ′ (s, µ) be the partition properties in which β i,j = α i,j + µ for every 1 ≤ i < j ≤ s.Also for each π ∈ π ′ (s, µ) define the index of π to be ind(π) = In the Step-1 we run the algorithm of Lemma 18 with the parameters t, µ defined above.This is the only randomized part of the algorithm.In the Step-2 of the algorithm we do the following.
Note that the query complexity of the algorithm is q = q 18 (t, µ) = poly(t/µ) = poly(k) • 2 poly(1/ζ) , as needed.Also, Lemma 18 guarantees that Step-1 of the above described algorithm succeeds with probability at least 2/3.It thus remains to show that assuming this event holds, Step-2 of the algorithm will return an (f ζ , γ)-final partition.First of all note that if it succeeds then it returns a partition of size at least k and at most T , as required.
The proof that if Step-1 succeeded, then Step-2 returns an (f ζ , γ)-final partition is identical to the proof of Claim 5.5 in [16], so we give a sketch of the proof.First, the reader might be wondering why every graph necessarily has an (f ζ , γ)-final partition as in the statement of the lemma.Let us actually explain why every G has an (f ζ , γ/2)-final partition, while using the definitions we introduced above.Start from an arbitrary equipartition A 0 of G into T 0 = k sets, and let ind 0 = ind(A 0 ) denote the index of A 0 as in Definition 12.If A 0 is (f ζ , γ/2)-final then we are done.If not, then there must be another partition A 1 of G with at least T 0 and at most f (T 0 ) = T 1 parts, with index ind(A 1 ) ≥ ind(A 0 ) + γ/2.Since 0 ≤ ind(A) ≤ 1 for every equipartition, we see that this process will eventually end up with a partition A of size k ≤ s ≤ T so that all partitions of G into at least s and at most f (s) parts have index less than ind(A) + γ/2.But this means that A is (f ζ , γ/2)-final.Note that we thus get that G has a (f ζ , γ/2)-final partition A of size s ≤ T .
Let us now explain how to turn the above existential proof into a proof of correctness of the algorithm describe earlier.Let M G (s) denote the largest index of an equipartition of G of size s.First we claim that for every k ≤ s ≤ t, For the second inequality in (4), let A be an equipartition with s parts such that M G (s) = ind(A).Let π ∈ π ′ (s, µ) be the partition property obtained from A by rounding down the densities to the closest integer multiple of µ.Then we have For the first inequality in (4), let π ∈ π ′ (s, µ) be a partition property which the algorithm accepted and such that M (s) = ind(π).Then G must be µ-close to π (as otherwise π should have been rejected).Let G ′ be a graph µ-close to G that satisfies π, and let A be the vertex partition of G ′ witnessing that G ′ satisfies π.Note that when turning G into G ′ , for each pair of parts of A, we change the density between this pair by at most µs 2 .Hence, in G, the partition property π is a 2µs 2 -signature of A (here and in what follows, we view π as a signature).So It follows from the existential proof above that there is k ≤ s ⋆ ≤ T and an equipartition A of G into s ⋆ parts which is (f ζ , γ/2)-final.We can assume that M G (s ⋆ ) = ind(A), because the equipartition satisfying this must also be final.We have So the algorithm will return a partition.
Note that the algorithm does not necessarily return the same signature/partition-property as above π that is µ-close to the above partition A. The reason for the algorithm to choose a different partition is that there might be another partition of size s with a larger index (which is of course also (f ζ , γ)-final) or there might be an s * < s with the same properties, or there might be other partitions with the same index.However, one can invert the reasoning in the previous paragraph and show that if a π is returned then it must be the γ-signature of an (f ζ , γ)-final partition.◀ 4 Proof of Lemma 15

Preliminary lemmas
In this subsection we describe some preliminary lemmas that will be used in the next subsection in which we prove Lemma 15.We start with introducing the Frieze-Kannan regularity lemma [17,18].We first state their notion of γ-regularity.
▶ Definition 19 (Frieze-Kannan Regularity [18]).Let G = (V, E) be a graph and A = {V 1 . . ., V k } be an equipartition of V (G).For a subset X ⊆ V and Roughly speaking, a partition A is γ-Frieze-Kannan-regular, or γ-FK-regular for short, if we can estimate the number of edges between large sets S, T from the intersection sizes S ∩ V i and T ∩ V i .We will also need the following slightly stronger notion of weak regularity that was introduced in [29].
▶ Definition 20 (Frieze-Kannan Regularity ⋆ [29]).In the setting of Definition 19, we say that A is γ-Frieze-Kannan Regular ⋆ if: The translation between these two notions will be crucial in Lemma 28 below.Suppose The following lemma is proved in [29] using a simple variant of the original proof of Frieze and Kannan [18].
▶ Lemma 21 (Frieze-Kannan Weak Regularity Lemma [18,29]).For every k 0 and γ > 0 there is T = T 21 (k 0 , γ) = k 0 • 2 poly(1/γ) so that the following holds for every graph G on at least T vertices.If A is an equipartition of V (G) into at most k 0 sets, then there is a refinement B of A into at most T sets such that d ⋆B □ (G) < γ.Let us now extend the definition of d □ to distance between pairs of weighted graph, where a weighted graph R is a complete graph, so that every edge (i, j) is assigned a weight 0 ≤ R(i, j) ≤ 1.
If R, R ′ are two weighted graphs on n vertices then we define and where the maximum is taken over all functions α, β : ▶ Definition 22 (ind(F, R)).Let R be a weighted graph on [k] and let φ be an injective function φ : In the case of φ not being injective, we define Note that we can think of a signature S = (η i,j ) 1≤i<j≤t as a weighted graph on t vertices.This means that for a pair of signatures S, S ′ we can define d 1 (S, S ′ ) and d □ (S, S ′ ) as in (7) and ( 8) respectively, and we can also define ind(F, S) as in (9).We will need the following lemmas from [25].The proof of the next two lemmas will appear in the journal version of the paper.▶ Lemma 23.Suppose R, R ′ are two weighted graphs on n vertices, and H is a graph on h vertices.Then for any γ Given a graph G on n vertices, and an equipartition A = {V 1 , . . ., V k }, we define the graph G A on V (G) to be the weighted graph with weights G A (u, v) = d(V i , V j ) for every u ∈ V i and v ∈ V j .Let S A be the 0-signature of A, that is, the weighted graph on k vertices with S(i, j) = d(V i , V j ).Observe that if k divides n (so all sets of A are of equal size) then ind(H, G A ) is almost the same as ind(H, S A ).It is not hard to see that for general equipartitions these quantities do not differ my much.
▶ Lemma 24.Given a graph G on n vertices, and an equipartition A = {V 1 , . . ., V k }, let G A and S A be defined as above.Then |ind(H, n for every graph H on h vertices. We now combine the above facts to conclude that a signature of a γ-FK-partition of a graph gives a good approximation of ind(H, G).The proof of the next lemma will appear in the journal version of the paper.▶ Definition 26 (Extension).Given a signature S = (η ij ) 1≤i<j≤t of an equipartition A, and a refinement B = {W 1 , . . ., W s } of A, the extension of S to B is the sequence S ′ = (η ′ ij ) 1≤i<j≤s defined as η ′ i,j = η k,l if there exist k ̸ = l such that W i ⊆ V k and W j ⊆ V l , and setting η ′ i,j = 0 if W i and W j are both subsets of the same V k .
The proof of the next claim will appear in the journal version of the paper.▷ Claim 27.For every ε and s there exists r = r 27 (ε) = poly(1/ε) and N = N 27 (ε, s) = poly(s/ε) so that the following holds for every pair of graphs G, G ′ on the same set of n ≥ N vertices.If G, G ′ are α-close and S, S ′ are γ, γ ′ -signatures of G, G ′ respectively, of the same equipartition A of the vertex set of G, G ′ into s ≥ r sets, then d 1 (S, S ′ ) ≤ α + ε + 2(γ + γ ′ ).
The proof of the next lemma will appear in the journal version of the paper.▶ Lemma 28.For every ε and t there exists γ = γ 28 (ε) = poly(ε) and N = N 28 (t, ε) = poly(t/ε), so that for every graph G on n ≥ N vertices, if S is a γ-signature of a γ-FKregular ⋆ partition A of G with t sets, then for every signature S ′ satisfying d 1 (S, S ′ ) ≤ δ for some δ, there is a graph G ′ that is (δ + ε)-close to G, so that A is an ε-FK-regular partition of G ′ , and S ′ is an ε-signature of A.
We will also need the following lemmas.
▶ Lemma 29 ([2] Lemma 3.7).For every ε, t there exists γ = γ 29 (ε) = poly(ε) and N = N 29 (t, ε) = poly(t/ε) satisfying the following.Assume A is an equipartition into s sets of a graph G with n ≥ N vertices, and that B is a refinement of A into at most t sets.Assume further that S is any γ-signature of A, and that T is its extension to B. If B satisfies ind(B) ≤ ind(A) + γ, then T is an ε-signature for B.
APPROX/RANDOM 2023 ▶ Lemma 30 ([16] Lemma 6.6).For every ε, t there exists N = N 30 (t, ε) = poly(t/ε) so that for every equipartition A of G with n ≥ N vertices into s sets, and every refinement B of A into at most t sets, ind(B) ≥ ind(A) − ε.
The next observation is implicit in the proof of the Frieze-Kannan Regularity Lemma (i.e.Lemma 21).The main step of the proof involves showing that if A is an equipartition of G into t parts and A is not ε-FK-regular ⋆ , then A has a refinement B into k ≤ 16t/ε 4 sets so that ind(B) ≥ ind(A) + ε 4 2 (see, e.g., the proof of Theorem 1.1 in [33] and the proof of Theorem 6 in [29]).
▶ Lemma 31.For every ε > 0 there exists γ = γ 31 (ε) = poly(ε) and f = f The proof of the next lemma will appear in the journal version of the paper.
▶ Lemma 32.For every s and ε > 0 there are γ = γ 32 (ε), and the following holds.Suppose G has at least N vertices and A is an (f, γ)-final partition of G into at most s sets and that S is a γ-signature of A. Then for every G ′ on the same vertex set of G, there exists a refinement A ′ of A into t ≤ T sets so that (i) A ′ is an ε-FK-regular ⋆ partition of G ′ .
(ii) Every refinement A ′′ of A with t ≤ T sets (and in particular A ′ ), is an ε-FK-regular ⋆ partition of G. (iii) For every refinement A ′′ of A with t ≤ T sets, the extension S ′′ of S (in the sense of Definition 26) with respect to A ′′ is an ε-signature of A ′′ with respect to G (note that A ′ is such an A ′′ ).
refinement of A with respect to G) asserts that S ′ is a γ 0 -signature of A ′ (with respect to G), which by the choice of γ 0 means that it is a γ 28 (min{ ε 2 , γ 25 (h, δ/6)})-signature for A ′ with respect to G. Now, Lemma 28 (applied with A ′ as the γ 0 -FK-regular ⋆ partition of G, and with S ′ as S and C as S ′ ) implies that there is a graph G ′ that is (α − ε 2 + ε 2 )-close to G, namely α-close to G, and for which C is a γ 25 (h, δ/6)-signature of A ′ , which in turn is γ 25 (h, δ/6)-FK-regular over G ′ .Lemma 25 implies that |ind(H, G ′ ) − ind(H, C)| ≤ δ/6 for all H ∈ H. Thus, ind(H, G ′ ) < δ/2 + δ/6 < δ for all H ∈ H as required.Hence we have found the required G ′ .

5
Proof of Theorem 9 The proof of Theorem 9 is very similar to that of Theorem 4. In order to assist the reader who is already familiar with the proof of Theorem 4, we mention in several places where certain lemmas are analogous to lemmas we introduced in one of the previous sections.The idea is the following: by a theorem of Goldreich and Trevisan [22], every testable property is testable by a canonical tester, which samples a set of vertices of size q = q P (ε) and accepts/rejects based on the graph induced by these q vertices.Hence the acceptance/rejection of the algorithm only depends on the number of induced copies in G of graphs on q vertices.Hence, turning a graph into a graph satisfying P is equivalent to turning it into a graph with a certain number of copies of certain graphs on q vertices.As evident, this is very similar to the case of Theorem 4 where we wanted to have a very small number of copies of graphs not in P. The reason why there is an additional exponential factor is that we need to control the number of induced copies of all graphs on q vertices.We now state the key lemmas, which are variants of lemmas we used in the proof of Theorem 4. Suppose that µ and ν are two probability distributions over graphs with set of vertices {v 1 , . . ., v q }, where each edge v i v j is independently chosen to be an edge with probability µ i,j and ν i,j respectively.If |µ i,j − ν i,j | ≤ ε/ q 2 for every 1 ≤ i < j ≤ q, then the variation distance between µ and ν is bounded by ε.
▶ Definition 36 (q-statistic).The q-statistic of a graph G is the probability distribution over all (labeled) graphs with q vertices that result from picking at random q distinct vertices of G and considering the induced subgraph.For a given graph H we denote the probability for obtaining H when drawing a graph according to the q-statistic by Pr G (H). ▶ Definition 37.For an equipartition A = {V 1 , . . ., V t } of G, and a signature S = (η i,j ) 1≤i<j≤t of A, the perceived q-statistic according to S is the following distribution Pr S over labeled graphs with q vertices v 1 , . . ., v q .Start by choosing a uniformly random sequence without repetitions of indices i 1 , . . ., i q from 1, . . ., t.Then, independently, take every v k v l for k < l to be an edge with probability η i k ,i l if i k < i l and with probability η i l ,i k if i l < i k .Then Pr S (H) is defined as the probability that the resulting labeled graph equals H.
The following lemma will replace Lemma 1 in the proof of Theorem 9.
▶ Lemma 38 (see [22]).If there is an ε-test for a graph property P that makes Q = Q(ε) edge queries, then there exists an appropriate family H of labeled graphs on q = 2Q vertices such that any graph G which satisfies P, satisfies also Pr G (H) ≥ 2  3 , and any graph G that is ε-far from satisfying P, satisfies also Pr G (H) < 1  3 .We now introduce a variant of Lemma 25 that is suited for the proof of Theorem 9.The proof of the lemma will appear in the journal version of the paper.
We now introduce a variant of Lemma 15 that is suited for the proof of Theorem 9.The proof of the lemma will appear in the journal version of the paper.
▶ Lemma 40.For every q and ε there exist γ = γ 40 (q, ε), s = s 40 (q, ε) and f (q,ε) 40 ), s = poly 2 q 2 ε , f (q,ε) 40 (x) = x • 2 poly 2 q 2 ε with the following property.For every family H of graphs with q vertices, there exists a deterministic algorithm, that receives as an input a γ-signature S of an (f, γ)-final partition A into t ≥ s sets of a graph G with n ≥ N 40 (q, ε, t) = t • 2 poly(1/ε)•2 poly(q) vertices and distinguishes given any α between the following two cases: (i) G is (α − ε)-close to some graph G ′ for which Pr G ′ (H) ≥ 2 3 .(ii) G is α-far from every G ′ for which Pr G ′ (H) ≥ 1  3 .Theorem 9 is derived from Lemmas 14 and 40, similarly to how Theorem 4 is derived from Lemmas 14 and 15.This will appear in the journal version of the paper.

▶ Lemma 25 .
For every h, k and δ > 0 there areγ = γ 25 (h, δ) = poly(δ/h), r = r 25 (h, δ) = poly(h/δ), N = N 25 (h, k, δ) = poly(hk/δ) ,so that if G is a graph on at least N vertices, and A is a γ-FK-regular partition of G with at least r and up to k parts, then for every γ-signature S of A, we have |ind(H, G)−ind(H, S)| ≤ δ for every H on h vertices.