Tolerant Testers for Subgraph-Freeness
Abstract
In this paper we study the problem of tolerantly testing the property of being -free (which also implies distance approximation from being -free).
In the general-graphs model, we show that for tolerant -freeness testing can be achieved with query complexity that is polynomial in the arboricity of the input graph , , and independent of the size of (for graphs in which the average degree is .
Specifically for triangles, our algorithm distinguished graphs which are -close to being triangle-free from graphs that -far from being triangle-free with expected query complexity which is (for constant and ).
For general -cliques our algorithm distinguishes graphs which are -close to being -free from graphs which are -far from being -free with expected query complexity which is polynomial in , , and .
We then generalize our result and provide a similar result for any motif which is -connected of radius . This includes for example the wheel-graph.
Finally, we show that our tester can be applied to the bounded-degree model for tolerantly testing -freeness for any motif . The query complexity of the algorithm is polynomial in the degree bound, , improving the previous state-of-the-art by Marko and Ron (TALG 2009) that obtained quasi-polynomial query complexity in .
Keywords and phrases:
Tolerant Testing, Property Testing, Subgraph freeness, distance approximation, arboricityFunding:
Reut Levi: The author was supported by the Israel Science Foundation under Grant 1867/20.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithmsEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The problem of testing -freeness, where is a small motif (namely, of size which is independent of the input graph), has received a lot of attention due to its fundamental importance for various fields including biology, sociology and network science. Detecting specific patterns, such as subgraphs, can reveal community structures, vulnerabilities, or anomalous behavior.
In the realm of property testing the decision problem is relaxed such that we are only aiming to distinguish graphs which are -free from graphs that are far from being -free (according to some predetermined distance measure). However, handling real-world data often means dealing with noise, incompleteness, or small errors. In these scenarios, tolerant testing (introduced by Parnas, Ron and Rubinfeld [15]) for -freeness becomes crucial, as it allows us to distinguish graphs that are nearly -free from those that are far from being -free, thereby bypassing the problem of imperfections in the data. More specifically, a tolerant property testing algorithm is required to distinguish objects that are -close to having a given property from objects that are -far from having , for some parameters . Clearly, by definition, tolerant-testing is at least as hard as testing. In fact, the separation between tolerant and non-tolerant testing can be very dramatic [8]. Another benefit of tolerant testing is that it implies (by a simple reduction) distance approximation where the ratio of approximation depends on the parameters , (see more details in [15]). Therefore, whenever feasible, we should aim to develop tolerant testers, ideally without incurring a significant overhead in query complexity.
One of the early works on tolerant testing by Marko and Ron [13] showed that it is possible to tolerantly test -freeness in the bounded-degree model [9] for any motif with query complexity that is quasi-polynomial in the degree bound and independent of the size of the graph.
In the regime of non-tolerant testers, Levi [12] showed that it is possible to test triangle-freeness with query complexity that depends linearly in the arboricity of the graph (which is tight for graphs with arboricity ) and is independent of the size of the graph (assuming the average degree is ). Since graphs of bounded degree have bounded arboricity, this result applies to a broader family of graphs. However, this tester is not tolerant.
One natural question is whether the result in [12] can be extended to show that tolerant testing of triangle-freeness (and ideally other motifs) is possible in the general-graphs model with query complexity that depends only on the arboricity of the graph. Such a result would also generalize [13] (since bounded-degree graphs inherently have bounded arboricity). Another important question is whether the query complexity of subgraph freeness in the bounded-degree model can be improved to be polynomial in the degree bound (rather than quasi-polynomial). In this paper, we answer both questions affirmatively.
1.1 Our Results
1.1.1 The general-graphs model
Tolerant Testing -freeness
Our tester for the property of being -free distinguishes graphs that are -close to being free from graphs that are -far from being -free, where and are parameters. The expected query complexity of the algorithm is where and is the average degree (see Theorem 21).
Specifically, assuming , for the case of triangles, our algorithm distinguished graphs which are -close to being triangle-free from graphs that -far from being triangle-free with expected query complexity which is .
For -cliques where , our algorithm distinguishes graphs which are -close to being -free from graphs which are -far from being -free with expected query complexity which is .
Tolerant -freeness for which is -connected with radius
We extend our result and apply our tester for motifs that are -connected with radius . This includes for example the wheel-graph 111Any connected graph with an additional vertex that is incident to all other vertices is -connected with radius .. Specifically our tester distinguishes graphs that are -close to being free from graphs that are -far from being -free. The expected query complexity of the algorithm is where and (see Theorem 28).
Specifically, assuming , the expected query complexity of our tester is .
1.1.2 The bounded-degree graphs model
Tolerant -freeness for any
We show that our tolerant tester can be applied to the bounded-degree model to test -freeness for any motif . Specifically the tester distinguishes graphs that are -close to being free from graphs that are -far from being -free. The expected query complexity of the algorithm is where and is the degree bound.
Specifically, assuming , the expected query complexity of our tester is .
This improves the result in [13] that has query complexity that is quasi-polynomial in to polynomial dependence in .
Remark 1.
As noted in [13], the result of Bogdanov, Obata and Trevisan [2] implies a linear lower bound on the query complexity of approximating the distance from being triangle-freeness for some small multiplicative and/or additive error (even in bounded degree graphs). Therefore, we can not hope to obtain tolerant testers of subgraph freeness for all parameters .
Remark 2.
As shown in [3], testing -freeness requires queries. Therefore we can not hope to obtain tolerant testers (in the general-graphs model) for general motifs whose query complexity depends only on the arboricity of the graph.
1.2 Our Algorithm
In this section we describe our algorithm for tolerantly test -freeness. This gives a good perspective of our approach. The first ingredient of our algorithm, which also appears in [12], is to perform the test on a subgraph of . This subgraph, which we refer to as is the graph after removing the edges between -heavy vertices, where a vertex is -heavy if its degree is at least where . By doing this modification we might decrease the distance by at most , so if the graph is sufficiently far from being -free, we will still have enough witnesses of violation. On the positive side, since this subgraph does not have edges in which both endpoints are -heavy, any in this graph will include at most a single -heavy vertex and therefore can be revealed by exploring the neighborhood of the non-heavy vertices of the clique.
The second ingredient of our algorithm is to approximate the number of copies of in (it will become clear later why we need to approximate this parameter). This can be done by sampling vertices, , u.a.r. from , checking if the degree of is bounded by and if so selecting random indices in . We then perform neighbor-queries on , according to the selected indices, and check if the subgraph induced on these neighbors (if such neighbors exist) induces a -clique. Thus, our sample space is of size and if the graph is -far from being -free then we have at least copies of so we need (which is when ) trials to get a good approximation.
Before we describe the last ingredient of our algorithm we shall define the -copies graph of , which we denote by . In this graph the vertex set is the set of -copies in and a pair of copies are adjacent iff they share at least one edge. Consider a maximal independent set (MIS), , of this graph. By definition, the copies of this set are edge-disjoint. Thus the minimum number of edges we need to remove from in order to make it -free, which we denote by , is at least . On the other hand, by the maximality of it follows that (since we can remove all -copies in by removing all the edges of the copies in ). Therefore if we can get a good approximation of any MIS of , then we will get a good approximation (up to a -factor) to the distance of from being -free. This is also true for any MIS that is constructed greedily according to some order on the vertices.
Fortunately, as shown by the ingenious work of Yoshida, Yamamoto and Ito [17], if we pick a random vertex in a graph and a random ordering on the vertices of the graph then the expected number of recursive calls we need to perform in order to locally simulate the outcome of the greedy MIS algorithm on that graph (according to the random ordering we picked) equals the average degree of the graph plus . Therefore, we can get a good approximation to the average probability that a random -copy belongs to a random greedy MIS (which is determined by a random ordering) by executing the local simulation on random -copies and a random orderings (both selected uniformly and independently on each trial). Since the ratio between any MIS in a graph to the total number of vertices in is at most , where denotes the maximum degree in , we obtain that it suffices to evaluate the outcome of a greedy MIS (each time according to a new random ordering) on copies selected uniformly at random. This will give us a good approximation to the average probability that a random -copy belongs to a random greedy MIS. When we multiply this by the approximation to the total number of -copies we previously obtained, this will provide a good approximation to the distance of from being -free.
1.2.1 The extension for other motifs
We describe the above mentioned algorithm as a general framework for tolerant testing of -freeness and then implement (according to the defined interface) the necessary specific procedures for testing -freeness and then provide more general procedures for -freeness of any motif which is -connected and has radius . Finally, we show that the procedures for the latter family of motifs can be used to test -freeness for any motif in the bounded degree model.
1.3 Related Work
1.3.1 Tolerant tesing of -freeness in the bounded degree model
As mentioned-above Marko and Ron [13] studied the problem of tolerantly testing -freeness in the bounded degree model. They begin by describing a global algorithm that constructs a cover of edges by first performing iterations in which the edges of some of the -copies are added to the cover by a random process (that can be viewed as a distributed randomized algorithm for constructing a large independent set of copies) and then a single edge is added to the cover from each one of the uncovered copies. Since the global algorithm has a distributed nature they show they can locally simulate its outcome by performing queries and therefore obtain a good approximation to the size of the cover which gives a good approximation to the distance from being -free.
1.3.2 Non-tolerant testing of subgraph-freeness
In the bounded degree model Goldreich and Ron [9] observed that it is possible to test triangle freeness with query complexity in graphs of maximum degree bounded by some constant.
The problem of testing triangle freeness in the general graph model was first studied by Alon, Kaufman, Krivelevich, Ron [1]. The query complexity of their algorithms depends on and , the number of vertices in the graph and the average degree, respectively. They provided sublinear upper bounds for almost the entire range of parameters. Moreover, their upper bounds are at most quadratic in their lower bounds. However, they are tight only when and , where denotes the maximum degree and denotes the average degree of the graph or when . Shortly after, Rast [16] and Gugelmann [10] improved the upper bound and lower bound of [1], receptively, for some ranges of the parameters.
More recently, Levi [12] studied the complexity of testing triangle freeness as a function of the arboricity of the graph (in the general graphs model). It was shown that the complexity of testing triangle freeness depends linearly in the arboricity of the graph for graphs in which the arboricity is and the average degree is .
Even more recently, Eden, Levi and Ron [3] studied the problem of testing -freeness and showed that if the motif is not a -clique then it is not always possible to provide a tester whose query complexity is independent of the size of the graph. In particular, they showed that testing -freeness requires queries.
1.3.3 Sublinear algorithms that receive the arboricity of the graph as a parameter
Several sublinear-time graph algorithms for counting and sampling give improved results when the graph has bounded arboricity. This includes the following results. Eden, Ron and Rosenbaum [4] designed an algorithm for sampling edges almost uniformly.
Eden, Ron and Seshadhri [6] estimate the degree distribution moments of an undirected graph and in particular estimate the average degree of a graph.
2 Preliminaries
We consider the general-graphs model [14, 11], where the algorithm can perform the following types of queries:
-
1.
For any , query for ’s degree;
-
2.
For any vertex and index , query for ’s -th neighbor (if , then a special symbol is returned);
-
3.
For any pair of vertices , query whether .
We denote by the maximum degree of the graph . We denote by the distance of from being -free, namely, the fraction of edges that we need to remove from to make it -free. We denote by , namely, the minimum number of edges we need to remove from in order to make is -free. For a graph we use to denote its vertex set and to denote its edge set.
When we say with high constant probability (w.h.c.p.) we mean we can adjust the high constant probability without changing the asymptotic complexity of the algorithm.
Definition 3 (Copies of a motif ).
For a graph and a motif , we say that a subgraph of is a copy of in if is isomorphic to .
We use to denote the number of -copies in .
Given a threshold , we call the vertices, , in such that , -heavy and otherwise we call them -light.
Definition 4 (The graph ).
For a graph and a threshold , the graph is the graph whose vertex set is and its edge set is:
Definition 5.
The arboricity of a graph , denoted by , is the minimum number of forests into which the edges of the graph can be partitioned.
It is known that for where , (see, e.g., Claim 4 in [12]).
We assume that the arboricity of the input graph as well as the number of edges, , and vertices, , are known to the algorithm.
Remark 6.
If the algorithm is not provided with the arboricity of the graph, then it may run the procedure from [12] that obtains the effective arboricity of the graph. More specifically, it was shown in [12] how to obtain a value that with high constant probability satisfies the following: (1) ; (2) The number of edges between vertices whose degree is at least is at most , which is suffices for our needs. Up to polylogarithmic factors in and polynomial factors in , the query complexity of the procedure is assuming the average degree is .
2.1 The -copies graph of
In order to describe our algorithm we shall use the following definitions.
Definition 7.
Given a graph and a motif we define the -copies graph of , to be the graph whose vertex set are the copies of in , and belongs to the edge set of iff and share at least one edge.
Next, we define the notion of an estimator for the number of -copies of a graph . The estimator is given a guessed lower bound on . If the guess is accurate, the estimator produces a reliable estimate; otherwise, it does not significantly overshoot, as detailed in the following definition.
Definition 8.
We say that an algorithm is a -copies number estimator for a graph if it receives as parameters , a guess of a lower bound on , and , the approximation parameter, and possibly other parameters for which the following holds: If then w.h.c.p. the return value of the algorithm is in . Otherwise, w.h.c.p. it is at most .
Remark 9.
The role of the guess is to provide some (indirect) control to the user of the estimator on the number of attempts to sample a copy from the graph by the estimator (e.g. if the graph is -free then the estimator might make too many attempts to sample a -copy with no success).
Definition 10.
An algorithm is a -copies oracle of a graph if it supports the following queries.
-
1.
On query Random-copy with parameters and , it returns a copy of in uniformly at random or fails. If the number of copies in is at least then it returns a copy w.p. at least .
-
2.
On query All-neighbors(), where is a copy of in , it returns all the neighbors of in (see Definition 7).
2.2 Local Simulation of Greedy Maximal IS
Our algorithm uses a local simulation of a greedy MIS algorithm. Algorithm 1 describes this local simulation. For a vertex set , we let denote the set of all permutations over .
Input: Query access to a graph and parameters and
-
1.
Perform an All-Neighbors query on and sort its neighbors according to . Let denote its neighbors according to this order.
-
2.
For
-
(a)
If precedes in , recursively call Local-Simulation-Greedy-MIS with parameters and . If the return value is YES then return NO.
-
(a)
-
3.
Return YES.
Let denote the number of (recursive) calls to Local-Simulation-Greedy-MIS during the evaluation of Local-Simulation-Greedy-MIS on , and . We shall use the following result from the work of Yoshida, Yamamoto and Ito [17].
Theorem 11 ([17]).
For any graph with vertices and edges,
3 The algorithm for tolerant testing of -freeness
Before describing our algorithm we first relate the distance from being -free to the size of a Maximal-Independent-Set of as stated in the following claim. We then use this relation to provide a relation between the expectation of a random variable to the distance from being -free.
Claim 12.
For any maximal independent set of , , it holds that
Proof.
Since is an independent set in , it follows that any pair is a pair of edge-disjoint copies of in (by definition). Since is maximal, it follows that if we remove for each all its edges from then becomes -free (since we remove at least one edge from each copy of in ).
Claim 13.
For that is drawn u.a.r. from and which is drawn u.a.r. from it follows that:
| (1) |
Moreover,
| (2) |
Proof.
By Claim 12, for any maximal IS in , , it holds that . Therefore, for any such it holds that , where is drawn u.a.r. from . This is true since . Since is maximal for any (by definition), the claim follows. Equation 2 follows from the fact that each vertex in the independent set covers at most vertices from the graph. Thus, any maximal independent set is of size at least , in particular, (for any ). This concludes the proof.
By using the previous claims we arrive to the following claim.
Claim 14.
If then with high constant probability, the return value of Approximate-Average-MIS-In-Auxiliary-Graph is in
| (3) |
Proof.
Let denote the event that in all the executions of Step 2a of the algorithm, a copy of was returned. By definition 10, the setting of and since , w.h.c.p., occurs. We henceforth assume occurs. Consider the random variable (see Step 2c of the algorithm). By construction takes values in . By Claim 13, Equation 1, . By Equation 2, . The claim follows from the setting of and the multiplicative Chernoff’s bound (see Theorem 30).
Input: , and query access to a graph
-
1.
Set and
-
2.
Run the number of -copies estimator for the graph with parameters and (see Definition 8) and obtain .
-
3.
If is less than then return ACCEPT.
-
4.
Run Approximate-Average-MIS-In-Auxiliary-Graph on the graph with parameters and and . Let denote the returned value.
-
5.
if then return REJECT.
We next prove our main theorem for this section. In order to realize this theorem we provide in the next sections concrete -copies oracles and -copies number estimators (first for which is a -clique and then generalize, for which is -connected and has radius ).
Theorem 15.
Given parameters , and query access to a graph , Tolerant-Test-Subgraph-freeness has the following guarantees. If is -close to being -free then w.h.c.p. it accepts . If is -far from being -free then w.h.c.p. it rejects .
Proof.
Assume is -close to being -free. If the algorithm accepts in Step 3 then we are done. Otherwise, it follows that .
Let denote the event that the estimator in Step 2 returns a good approximation as described in Definition 8. Conditioned on and the fact that it is implied that (assume otherwise and reach a contradiction to the fact that as ) and hence its return value is in .
Let denote the event that Approximate-Average-MIS-In-Auxiliary-Graph returns a good approximation as described in Claim 14. Conditioned on we obtain that
| (4) |
Since we obtain that and hence the algorithm accepts .
Assume is -far from being -free where . By the setting of it follows that . Thus, . Therefore, conditioned on we obtain that . In particular, it holds that . Hence conditioned on we obtain that Equation 4 holds in this case as well. In particular (since . Thus, conditioned on , the algorithm rejects , as desired.
Claim 16.
The expected query complexity of Tolerant-Test-Subgraph-freeness is bounded above by the query complexity of performing random-copy queries to with parameters and (see Definition 7), all-neighbor queries to , where , and a single execution of a -copies number estimator for (see Definition 8).
Proof.
By construction, linearity of expectation and Theorem 11.
4 The oracles for -cliques
In this section we describe our -copies oracle and -copies number estimator. We conclude with our main theorem about tolerant testing of -freeness (Theorem 21).
Input:
-
1.
Set
-
2.
For
-
(a)
Sample a vertex u.a.r. and a set of indexes, u.a.r. from
-
(b)
If -heavy then set and go to the next iteration.
-
(c)
Otherwise, query for each . Let denote the respective return values.
-
(d)
Let denote the number of -light vertices in .
-
(e)
If then set (there is more than a single -heavy vertex in the set).
-
(f)
If the subgraph induced on is a -clique then set w.p. . Otherwise set .
-
(a)
-
3.
Return .
Claim 17.
For every iteration of Algorithm 4 it holds that:
| (5) |
Proof.
Consider a copy, , of in . Let denote the number of -light vertices in . The probability that is found in Step 2f of Approximate--copies-in- is exactly . Conditioned on the event that indeed was found in Step 2f of iteration the probability that is set to be is . Thus, for every , . The claim follows.
Claim 18.
Approximate--copies-in- is a -copies number estimator of the graph . Its query complexity is .
Proof.
For the claim follows from Claim 16, the setting of and multiplicative Chernoff’s bound (see Theorem 30). To see the correctness of the second part, consider . By the first part of the claim, w.h.c.p. the return value of the algorithm is in . Namely, at most . Now consider the case in which . By a coupling argument, w.h.c.p. the return value is at most in this case. To see this, couple the return value of the algorithm to the return value of the algorithm on the graph after we add copies to the graph (arbitrarily). Since we added copies, the return value of the algorithm on the modified graph dominates the return value on the original graph (to see this consider executing the algorithm in parallel in both graphs - the outcome in the modified graph is always at least as high) but is guaranteed to be at most w.h.c.p. by the above. The claim about the query complexity follows from construction. This concludes the proof.
Input: , ,
-
1.
Repeat times:
-
(a)
Sample a vertex u.a.r. and a set of indexes, u.a.r. from
-
(b)
If is not -light then break.
-
(c)
Otherwise, query for each . Let denote the respective return values.
-
(d)
Let denote the number of -light vertices in .
-
(e)
If or if the subgraph induced on is not a clique, then go to the next iteration.
-
(f)
Otherwise, w.p. , return the -clique induced on and .
-
(a)
Claim 19.
The query complexity of Get-Random--Copy-in- is . If then it returns a copy w.p. at least .
Proof.
For any copy of , , in each iteration of the while-loop, the probability that we return in Step 1f of the algorithm is exactly . Therefore if the number of copies is greater than then the probability that it will return a copy during in a single iteration is at least . The claim follows by the setting of the number of iterations.
Input: ,
-
1.
Set
-
2.
If , reveal the neighbors of all the -light vertices of . Let denote the set of revealed neighbors. If there is a -heavy vertex, , in , perform a pair query between and each one of the vertices in .
-
3.
Otherwise, for each -light vertex, , of : Reveal all the neighbors of in and then for each -light neighbor, reveal all its neighbors.
-
4.
Add all the -copies that were revealed in the previous steps that belong to (namely, that have at most one -heavy vertex) and share an edge with .
-
5.
Return .
Claim 20.
The query complexity of -All-Neighbors-in- is for and for .
Proof.
The claim follows from construction.
Theorem 21.
There exists a tolerant tester for the property of being -free that distinguishes graphs that are -close to being free from graphs that are -far from being -free, where and are parameters. The expected query complexity of the algorithm is where .
Proof.
5 The oracles for -connected motifs of radius
In this section, is a motif which is -connected and has radius . We make the following observation on the copies of in .
Observation 22.
For any copy of , , in , at least one of the following holds:
-
1.
there exists a -light vertex in which is incident to all the vertices in .
-
2.
there is exactly one -heavy vertex in .
Proof.
Let be a vertex in which is incident to all other vertices in (such vertex exists since has radius ). If is -light then Item 1 holds. Otherwise, is -heavy. Since there are no edges in such that both endpoints are -heavy, the claim follows. We say that a BFS is a -restricted if it explores only the neighbors of the -light vertices it reaches. We say that an exploration reveals a copy if all the vertices of appeared in the exploration.
Claim 23.
Let be a motif that is -connected and has radius . In every copy of in , , for every -light vertex in , a -restricted BFS from of depth reveals .
Proof.
If item 1 of observation 22 holds, then clearly the claim follows. Otherwise, there is exactly one -heavy vertex, , in . Since is -connected, after we remove from , remains connected and its diameter is at most . Thus, a -restricted BFS of depth from any -light vertex of reveals all the -light vertices of . The claim follows from the fact that at least one of the -light vertices in is a neighbor of . Therefore, by Claim 23, finding for a -light vertex, , all the -copies in that include , can be done by using queries. Consequently, a query for all-neighbors in can be answered by making queries to . This can be achieved by going over all vertices of the queried copy, , and for each -light vertex, finding all its copies. From the union of these sets we then remove all the copies that do not share an edge with and obtain all the neighbors of in .
To support random copy queries we assign each copy of to its -light vertex of lowest id. To sample a -copy u.a.r. from we first pick a vertex u.a.r., if it is -light then we find all its -copies as described above. We then pick a copy that is assigned to u.a.r. and return it w.p. where denotes the number of copies that are assigned to in and denotes the maximum number of copies that include a specific -light vertex (recall that the copies in are assigned only to -light vertices). Clearly, the above algorithm samples each copy of in w.p. . Consequently, the probability that it returns a copy is . Thus, the success probability of the algorithm depends on . A straightforward upper-bound on is where the first term is due to all possible labeling of the vertices and the second term is due to all possible ways to pick the vertices of the copy from the set of vertices we explored. In the next claim we provide a tighter bound on .
Claim 24.
For any motif , it holds that .
Proof.
Fix an arbitrary labeling of . Let . We next bound . Label the -copies that belongs to according to the fixed labeling of (each copy has its own labeling). If there is more than one such labeling per copy (due to automorphism), pick one arbitrarily. We next describe a one-to-one mapping from the set of labeled copies to a set of sequences of integers. We then bound the cardinality of the latter set and obtain a bound on the number of -copies that belongs to and hence a bound on . Fix a labeled copy of , , and consider the BFS exploration of starting at , where vertices with smaller label are explored first. Given the label of and the sequence of the vertices in by their BFS order we can reconstruct the labels of all the vertices in the copy. The set of vertices and their labels are mapped to a single copy. Moreover, by the above, we can identify the copy by the identity of the root, its label and a sequence of indexes in . Each index indicates the index of the respective element in its parent adjacency-list. The indexes are listed according to the BFS exploration. Since the identity of the root is known, we can inductively reconstruct the tree by the sequence of indexes. Note that the parents are always -light (while the leaves may be -heavy) therefore it suffices to use indexes in . From the above we see that for a specific label of there are at most copies of to which belongs. Since there are possible labels, we get the desired bound.
Input:
-
1.
Set
-
2.
For
-
(a)
Sample a vertex u.a.r.
-
(b)
If is not -light then set and go to the next iteration.
-
(c)
Otherwise, perform a -restricted BFS from to depth and reveal all the -copies assigned to in . Set to be this number.
-
(a)
-
3.
Return .
Claim 25.
Approximate--copies-in- is a -copies number estimator of the graph . Its query complexity is .
Proof.
The claim about the query complexity follows by construction. Consider the random variable which is defined in Approximate--copies-in- . Since and , for , by the setting of and the multiplicative Chernoff’s bound (Theorem 30), w.h.c.p the output of the algorithm is in , as required. For , by a coupling argument, w.h.c.p. the output of the algorithm is at most (see more details in the proof of Claim 14).
Input: , ,
-
1.
Repeat times:
-
(a)
Sample a vertex u.a.r.
-
(b)
If is -heavy then go to the next iteration.
-
(c)
Otherwise, perform a -restricted BFS from to depth and reveal all the -copies assigned to in .
-
(d)
Pick a random copy from the -copies assigned to and return it with probability .
-
(a)
Claim 26.
The query complexity of Get-Random--Copy-in- is . It returns a -copy w.p. at least .
Proof.
The claim about the query complexity follows by construction. In each iteration of the algorithm, each copy of is returned with probability . Thus, in each iteration a copy is returned w.p. . Thus, if , a copy is returned w.p. at least by the setting of the number of iterations.
Input:
-
1.
Set
-
2.
For each vertex of :
-
(a)
Reveal all the -copies of in by performing a -restricted BFS to depth from in .
-
(b)
Add to all the revealed -copies that share an edge with .
-
(a)
-
3.
Return .
Claim 27.
The query complexity of -All-Neighbors-in- is .
Proof.
By Construction.
Theorem 28.
There exists a tolerant tester for the property of being -free for that is -connected and has radius that distinguishes graphs that are -close to being free from graphs that are -far from being -free, where and are parameters. The expected query complexity of the algorithm is where and .
Proof.
6 The tester for the bounded-degree model
Clearly, Tolerant-Test-Subgraph-freeness applies also to bounded degree graphs. Moreover, for the setting where the graph is simply and all the vertices in the graph are -light. Consequently, we can test -freeness for any motif by using the oracles in Section 5 without making any modifications 222In fact the query complexity of Algorithms 7- 9 can be slightly improved by setting the depth of the BFS exploration to be the diameter of (instead of ).. We obtain the following result.
Theorem 29.
There exists a tolerant tester, in the bounded degree model, for the property of being -free for any motif . Specifically the tester distinguishes graphs that are -close to being free from graphs that are -far from being -free, where and are parameters. The expected query complexity of the algorithm is where and is the degree bound.
Proof.
By the proof of Theorem 21.
References
- [1] Noga Alon, Tali Kaufman, Michael Krivelevich, and Dana Ron. Testing triangle-freeness in general graphs. SIAM Journal on Discrete Mathematics, 22(2):786–819, 2008. doi:10.1137/07067917X.
- [2] Andrej Bogdanov, Kenji Obata, and Luca Trevisan. A lower bound for testing 3-colorability in bounded-degree graphs. In 43rd Symposium on Foundations of Computer Science (FOCS 2002), 16-19 November 2002, Vancouver, BC, Canada, Proceedings, pages 93–102. IEEE Computer Society, 2002. doi:10.1109/SFCS.2002.1181886.
- [3] Talya Eden, Reut Levi, and Dana Ron. Testing c_k-freeness in bounded-arboricity graphs. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, July 8-12, 2024, Tallinn, Estonia, volume 297 of LIPIcs, pages 60:1–60:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ICALP.2024.60.
- [4] Talya Eden, Dana Ron, and Will Rosenbaum. The arboricity captures the complexity of sampling edges. In 46th International Colloquium on Automata, Languages, and Programming, ICALP, pages 52:1–52:14, 2019. doi:10.4230/LIPIcs.ICALP.2019.52.
- [5] Talya Eden, Dana Ron, and Will Rosenbaum. Almost optimal bounds for sublinear-time sampling of k-cliques in bounded arboricity graphs. In 49th International Colloquium on Automata, Languages, and Programming, ICALP, pages 56:1–56:19, 2022. doi:10.4230/LIPIcs.ICALP.2022.56.
- [6] Talya Eden, Dana Ron, and C. Seshadhri. Sublinear time estimation of degree distribution moments: The arboricity connection. SIAM J. Discret. Math., 33(4):2267–2285, 2019. doi:10.1137/17M1159014.
- [7] Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1467–1478, 2020. doi:10.1137/1.9781611975994.89.
- [8] Eldar Fischer and Lance Fortnow. Tolerant versus intolerant testing for boolean properties. Theory Comput., 2(9):173–183, 2006. doi:10.4086/TOC.2006.V002A009.
- [9] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. Algorithmica, 32(2):302–343, 2002. doi:10.1007/S00453-001-0078-7.
- [10] L. Gugelmann. Testing triangle-freeness in general graphs: Lower bounds. Bachelor thesis, Dept. of Mathematics, ETH, Zurich, 2006.
- [11] Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6):1441–1483, 2004. doi:10.1137/S0097539703436424.
- [12] Reut Levi. Testing triangle freeness in the general model in graphs with arboricity . In 48th International Colloquium on Automata, Languages, and Programming, ICALP, volume 198, pages 93:1–93:13, 2021. doi:10.4230/LIPICS.ICALP.2021.93.
- [13] Sharon Marko and Dana Ron. Approximating the distance to properties in bounded-degree and general sparse graphs. ACM Trans. Algorithms, 5(2):22:1–22:28, 2009. doi:10.1145/1497290.1497298.
- [14] Michal Parnas and Dana Ron. Testing the diameter of graphs. Random Structures and Algorithms, 20(2):165–183, 2002. doi:10.1002/RSA.10013.
- [15] Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. Journal of Computer and System Sciences, 72(6):1012–1042, 2006. doi:10.1016/j.jcss.2006.03.002.
- [16] T. Rast. Testing triangle-freeness in general graphs: Upper bounds. Bachelor thesis, Dept. of Mathematics, ETH, Zurich, 2006.
- [17] Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. Improved constant-time approximation algorithms for maximum matchings and other optimization problems. SIAM J. Comput., 41(4):1074–1093, 2012. doi:10.1137/110828691.
Appendix A Appendix
Theorem 30 (Multiplicative Chernoff’s Bound).
Let be identical independent random variables ranging in , and let . Then, for every , it holds that
| (6) |
