Testing Depth First Search Numbering
Abstract
Property Testing is a formal framework to study the computational power and complexity of sampling from combinatorial objects. A central goal in standard graph property testing is to understand which graph properties are testable with sublinear query complexity. Here, a graph property is testable with a sublinear query complexity if there is an algorithm that makes a sublinear number of queries to the input graph and accepts with probability at least , if the graph has property , and rejects with probability at least if it is -far from every graph that has property .
In this paper, we introduce a new variant of the bounded degree graph model. In this variant, in addition to the standard representation of a bounded degree graph, we assume that every vertex has a unique label from , and in addition to the standard queries in the bounded degree graph model, we also allow a property testing algorithm to query for the label of a vertex (but not for a vertex with a given label).
Our new model is motivated by certain graph processes such as a DFS traversal, which assign consecutive numbers (labels) to the vertices of the graph. We want to study which of these numberings can be tested in sublinear time. As a first step in understanding such a model, we develop a property testing algorithm for discovery times of a DFS traversal with query complexity and for constant we give a matching lower bound.
Keywords and phrases:
Randomized Algorithms, Graph Algorithms, Property TestingFunding:
Artur Czumaj: Research partially supported by the Centre for Discrete Mathematics and its Applications (DIMAP), by a Weizmann-UK Making Connections Grant, and by an IBM Award.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithmsEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Depth-first search (DFS) is one of the most useful and frequently used algorithmic primitives in graph algorithms. The DFS algorithm is known for well over a century [17, 23] as a technique for threading mazes and has been widely used in the design of graph and network algorithms since the late 1950s. DFS is a basic tool to traverse a graph in a structured way and is central to solve textbook problems such as connectivity, topological sorting [22], determining strongly connected components in directed graphs [21], biconnected components in undirected graphs [21], and to test planarity [15]. Because of its versatility and usefulness for solving many graph problems, DFS has become one of the most important tools in the design of algorithms for graphs, taught in the first year of many computing science study programs. One of the most useful combinatorial properties of DFS is the DFS numbering of vertices, which is the order in which a DFS algorithm discovers all vertices of the input graph [21]. (See also Algorithms 1 and 2, where denotes the set of neighbors of vertex .)
We remark that sometimes, e.g., in the widely used textbook by Cormen et. al. (see Chapter 20.3 in [5]), the DFS numbering is also used together with finishing numbers. We also remark that the labeling is not unique and depends on the ordering in which the vertices are traversed in the for-loops of Algorithms 1 and 2.
Definition 1 (DFS numberings).
Let be a labeled undirected graph on vertices. A labeling is called a DFS numbering of if DFS gives rise to it for some order of processing the vertices in the for-loops of Algorithms 1 and 2.
Given a wide applicability of DFS and DFS numbering, a natural problem is to verify whether a given graph is correctly DFS-numbered, i.e., the labels assigned to the vertices are a DFS numbering for some ordering of vertices and neighbors. The latter problem is easy to solve by simulating a DFS using the given numbering and locally verifying that the DFS is performed correctly. However, is it also possible to approximately verify whether the graph is properly DFS-numbered using a sampling based algorithm? An answer to this question sheds some light on how the local structure of a DFS-numbered graph (as implicitly given by the sample distribution) is related to the global DFS numbering.
We will study this question in the framework of Property Testing (see, e.g., the monographs [4, 9]). Property Testing provides a formal setting to study approximate decision problems. The goal is to distinguish objects that satisfy the tested property from those that are -far from every object that satisfies the property. Here, -far means that one has to modify more than an -fraction of the object’s description to obtain an object that has the property. A property testing algorithm requires some form of sampling access to the object.
In this paper we will introduce a variant of the standard bounded-degree graph model [12] that in addition to the standard setting, allows also label queries. We will assume that there exists some labeling that assigns unique labels from the set to the vertex set , i.e., the labeling is a bijection . We say that a graph with maximum degree is -far from a DFS-numbered graph, if we have to modify (insert or delete) more than edges to obtain a graph that is correctly DFS-numbered111While one often uses the bound of edges instead of edges, in the setting when these terms differ only by a constant factor and as such there are essentially indistinguishable.. We do not allow to modify labels, though we notice that allowing to modify labels would also be a valid model (see Section 2.1 for some discussion). Our motivation to not consider it here was merely to stick as closely to the standard testing model as possible.
In this new framework we study a fundamental problem of whether the input labeled undirected graph is properly DFS-numbered or it is -far from having a valid DFS numbering. Our main result is a tight (for constant and ) complexity bound for this task. First, we show that any tester for the DFS numbering requires queries, and then, we complement this bound with an algorithm that tests DFS numbering with queries.
The proof of the lower bound (Theorem 7 in Section 4) is by constructing two families and of good and bad random labeled graphs for which we will show that distinguishing between these families is necessary for any DFS-tester. Then we will show that distinguishing between these families requires queries.
The proof of the upper bound (Theorem 11 in Section 5) relies on a characterization of labelings that are -far from a valid DFS numbering. The characterization is described in a form of some conflicts between the labelings and we design two algorithms detecting such conflicts: one for local conflicts and one for global conflicts. By combining these two algorithms we will obtain an algorithm that tests DFS numbering with queries.
1.1 Related work
Property Testing was introduced by Rubinfeld and Sudan [20] and first studied in the dense graph setting by Goldreich and Ron [10]. Constant time testability in the dense graph model has been fairly well understood [2] and is closely related to the regularity lemma. In our paper we build upon the bounded degree graph model, as introduced by Goldreich and Ron [12]. First results in this model included testers for properties such as connectivity, -connectivity and being Eulerian [12]; bipartiteness [11] and cycle freeness [6] can be tested in the bounded degree graph model in time. A series of papers proved that every hyperfinite property of bounded degree graphs is testable in constant time [3, 7, 14, 18]. Also, every constant time testable property in bounded degree graphs is either finite or contains a hyperfinite property [8]. Furthermore, it is known that every first-order logic property of bounded degree graph with an quantification is constant time testable while some properties with a quantification are not [1]. Further works in the bounded degree graph model include, e.g., testability using proximity oblivious testers [13].
2 Preliminaries: The model
In this paper, let denote an undirected graph with vertices and edges. Let denote the set of neighbors of , i.e., , and be an upper bound on the maximum number of edges incident to any vertex in . We will assume that is a labeled graph, i.e., there is a bijection (permutation) that assigns numbers to the vertices of and we use .
2.1 Bounded degree graph model for labeled graphs
In this section we describe how our algorithm can access the input graph of maximum degree with vertex labeling . Our model is a slight modification of the standard bounded degree graph model [12] that in addition allows access to labels. As usual in this model, we assume that and as well as are given to the algorithm. The algorithm can ask the following two types of queries and receives an answer in constant time:
-
Neighbor queries: for every vertex and every , one can query the th neighbor of vertex .
-
Label queries: for every vertex , one can query the label of , .
Observe that we allow access to vertices and their neighbors, and we can check the label of any vertex , but we have no access to vertices through their labels . In particular, to access vertex with a given label , we have to query the oracle until such vertex is returned. This is a special feature distinguishing our model from the standard model used in graph property testing and making the problem challenging: we have a labeled graph, but we cannot access the graph (its vertices) through the labels! Instead, the only way to access the graph is by taking any vertex and either querying its label (via the label query) or querying its th neighbor (via a neighbor query).
The query complexity of a testing algorithm is the number of oracle queries.
A property testing algorithm (in short, property tester) for DFS-numbered graphs is an algorithm that has access to an input graph as described above and that accepts the input with probability at least , if is a graph with being a valid DFS-numbering. The algorithm rejects with probability at least if is -far from having a valid DFS numbering according to the following definition. (The algorithm developed in this paper is a property tester with one-sided error, i.e., it always accepts, if is a DFS-numbering.)
Definition 2 (-far from DFS numberings in bounded degree graphs).
Let be an undirected graph on vertices with maximum degree at most and let be a bijection that assigns labels to . We say the labeled graph is -far from having a valid DFS numbering, if one has to modify222Modification of the edges means edge insertions and deletions, i.e., we require , where is the symmetric difference. more than edges in to obtain a graph of maximum degree at most for which is a valid DFS numbering.
Implicitly labeled graphs.
To simplify the presentation, we will often assume . However, we will not use this knowledge in the algorithm as the model prevents us from querying . Our tester will only ask for a random vertex or for a vertex that occurred as the neighbor of a previously queried vertex.
Further thoughts about the model: Modifying the labels.
Observe that in general, it might be natural to consider a revised definition of a labeling being -far from a valid DFS numbering, where while defining graph in Definition 2, in addition to edge modifications, one would allow also for modifications of the labels333We use the following revised definition: a labeling is -far from a valid DFS numbering of if for any graph of maximum degree at most with a valid DFS numbering we have .. We observe that for bounded degree graphs, adding the labels modifications does not change the problem significantly.
Lemma 3.
Let be a permutation of to . For a given bounded degree graph , a labeling is -far from a valid DFS numbering according to the edges modifications if and only if is -far from a valid DFS numbering according to the edges and labels modifications.
Proof.
Observe that the number of modifications of the edges to obtain a valid DFS numbering is not smaller than the number of modifications of the edges and the labels to obtain a valid DFS numbering. Therefore, if for a given bounded degree graph , a labeling is -far from a valid DFS numbering according to the edges and labels modifications then is also -far from a valid DFS numbering according to the edges modifications.
For the other direction, if is -far from a valid DFS numbering according to the edges modifications then we can simulate modification of the labels by modification of the edges: to assign a given label to a vertex we just remove all its incident edges and add all edges incident to the vertex with the label sought. (Observe that a vertex might have had some edges of its own that have to be removed but by correcting the labels one by one, we can ensure that once we fix vertex by assigning it to label with vertex having label before, then later we will have to fix the label of vertex , resolving the issue.) Observe that if we apply this operation to all vertices with the labels to be changed, then we do not increase the maximum degree, and hence, modifying of labels can be simulated by modifying at most edges. Therefore, if a labeling is -far from a valid DFS numbering according to the edges modifications then is also -far from a valid DFS numbering according to the edges and labels modifications.
Why should the labels be a permutation?
Our model assumes that the input labeling is a permutation (bijection) of to , but it may be natural to consider the case when is not necessarily a permutation but rather an arbitrary function from to , allowing repetitions of labels. Observe that already the simple problem of testing if is a permutation or is -far from being a permutation (also known as the element distinctness problem) is known to require queries (see, e.g., [19]). In view of that bound, the best what we could hope for without assuming that is a permutation of to is to achieve query complexity of . And this can be easily obtained by combining our algorithm in Theorem 11 with the known algorithms testing element distinctness, leading to a testing algorithm with queries.
3 Basic Properties of DFS Numberings
We begin with a review of basic DFS terminology, introduce some notions of our own, and make a few useful observations. (We also refer to standard textbooks, e.g., [5, 16].)
The DFS algorithm, as given in Algorithm 1, numbers every vertex even if is disconnected due to the outer loop over all vertices. Every vertex is initially undiscovered. When DSFVisit() is called, becomes discovered. We say is active as long as the call DSFVisit() persists and is finished as soon as it ends. The vertex that initiates the call of DSFVisit() is the parent of . If the call of DSFVisit() is initiated from the outer loop then we say that is an orphan with virtual parent . The idea is that the outer loop iterating over all vertices is like a call to DSFVisit() where is a virtual vertex connected to all other vertices. Each connected component of has exactly one orphan, which receives the smallest number in its connected component. At any point during the execution the white path consists of all active vertices including the virtual vertex , with every active vertex (other than ) connected to its parent. The DFS numbers appear in increasing order along the white path. In an implicitly labeled graph , we define for as
Claim 4.
If the numbers in an implicitly labeled graph correspond to a DFS numbering then is the (possibly virtual) parent of .
Proof.
If is an orphan then it has the smallest number in its connected component so . Hence , which is the virtual parent of orphans by definition.
Now assume was discovered from a non-virtual parent . Then with so . To see that is the maximum of , consider . Immediately before is discovered, is at the end of the white path, hence the largest active vertex. Since has been discovered, must be finished. Hence all neighbours of have been discovered. Since is not discovered we have so .
Observe that any edge with in an implicitly labeled graph corresponds to an interval representing a range of elements with respect to the DFS numbering. The analysis of the inter-relation between such intervals plays a central role in our analysis.
Definition 5.
A pair is a conflicting pair if .
The following central lemma shows that the absence of conflicting pairs is necessary and sufficient for the validity of a DFS numbering (we defer a simple proof to the full version).
Lemma 6.
Let be an implicitly labeled graph. The following are equivalent.
-
(i)
has a valid DFS numbering.
-
(ii)
There exists no conflicting pair.
4 Testing DFS Numbering Requires Queries
In this section, we prove our first main result.
Theorem 7.
Every property tester for the property of having a valid DFS-labeling has a query complexity of .
Let be sufficiently small (any would do). The proof of Theorem 7 is by constructing two families and of good and bad random labeled graphs for which we will show that distinguishing between these families is necessary for any DFS-tester. Then we will show that distinguishing between these families requires queries.
4.1 Construction of good and bad random labeled graphs and
Let . Each graph and consists of arms of vertices each. The roots of these arms are connected with some binary tree that is the same for and .
There are four types of arms, as described in details in Figure 1.
Each arm of is a copy of or chosen independently and uniformly at random. Similarly, each arm of is a copy of or chosen independently and uniformly at random.
The labeling of is then obtained by starting in the root of the tree and using within the arms the relative ordering defined by the numbering of the arms or . For an example, see, e.g., Figure 3. (Since each arm has the same number of vertices, the choice of one arm, whether in or , does not affect the numbering of other arms.)
Similarly (see Figure 4), the labeling of is defined by starting at a root of and using within the arms the relative ordering defined by the numbering of the arms or .
4.2 Properties of DFS numberings of random labeled graphs and
Let us first notice that our construction of good random labeled graphs easily ensures that they have valid DFS numberings (with probability 1).
Lemma 8.
has a valid DFS numbering. ∎
A situation is different for bad random labeled graphs . While the arms of type also locally maintain a valid DFS-order, the arms of type do not (because of the two comb graphs). As the result, the construction of typically gives labelings that are -far from valid DFS numberings.
Lemma 9.
Let and let be sufficiently large. Then has a labeling that is -far from a valid DFS numbering with probability .
Proof.
Consider the numbering for . Let us fix any and focus on vertices with labels , respectively. (For example, in Figure 4 in the top branch (where numbers have an offset of ), if we took then we would have to be vertices with labels in .) Observe that the fact that implies that this is not a valid DFS numbering as long as is the DFS-child of and is the DFS-child of . Therefore, to turn the labeled graph into one consistent with DFS numbering, one has to add or delete at least one edge incident to a vertex from . Since such a quadruple is obtained for each and since each of these quadruples is disjoint (for example, in Figure 4, the there are four such quadruples formed by vertices with labels , , , and ), one needs to modify at least edges to obtain a valid DFS numbering. Since the expected number of copies of in is , the expected number of changes required in to obtain a valid DFS numbering is at least . For sufficiently large , a standard Chernoff bound implies that graph is -far from having a valid DFS numbering with high probability.
4.3 Hardness of distinguishing between good and bad labeled graphs
Our next central lemma provides a lower bound for the number of queries of any algorithm ALG that distinguishes between the families of random labeled graphs and . We will assume that the input for on which ALG requires oracle queries is obtained by first selecting a random bit and then setting
Lemma 10.
Let . For every randomized algorithm ALG’ that receives as input, performs queries, and outputs a bit , we have .
Proof.
Let ALG’ be an algorithm that receives as its input. We assume that ALG’ can query the oracle for any vertex by submitting an ID of , and the oracle returns the label of , and the IDs of all neighboring vertices. Further, without loss of generality, we assume that the IDs are randomly assigned to the vertices of the graph. Therefore ALG’ is limited to querying for a random vertex (a random query) or for a vertex that has been previously found as a neighbor of an earlier queried vertex (an explorative query).
Without loss of generality, we can assume that the vertices selected by random queries are chosen in the order before the run of ALG’; let be the sequence of these first random vertices (ALG can select only these vertices). Let be the random event that no two vertices come from the same arm of . We prove the following:
| (1) | |||
| (2) |
In order to prove inequality (1), let us first define to be the number of pairs with such that both and come from the same arm of , or in other words, so that and belong to the same copy of arm . We have,
Therefore we can conclude (1) by Markov’s inequality: .
Next, we want to prove inequality (2). Let be the set of vertices reachable from vertices in in at most steps, that is, . Let be the subgraph of induced by the vertex set . We observe that ALG’ will make its decision solely on seeing some subgraph of . Hence, the output of ALG’ is a (random) function of . Now, we claim that conditioned on , the random variables and are stochastically independent, which in turn, would imply identity (2).
To prove that and are independent, let us consider a single vertex from and the subgraph induced by vertices with distance at most from . If contains at least one vertex from , then consists solely of some of the vertices from and some vertices that are close to the roots of some of the arms (within parts denoted by of the arms, since each such part has length and we have since and ). Since these parts are identical in and , such paths share no information on .
Otherwise, if contains no vertex from , then is a part of an arm of . But then, due to , at most one joint444By a joint we mean an endpoint of a path or comb from which the arm was stitched together. of this arm is contained in . Since the labels of the roots of the arms are the same in and , it is always well-defined what the offset of a label within its arm is. For more details, see the full version of the paper.
4.4 Hardness of testing DFS numbering (proof of Theorem 7)
By Lemmas 8 and 9, any algorithm ALG that accepts a labeled graph with a valid DFS numbering with probability at least and rejects a labeled graph with a DFS numbering that is -far (for ) from being valid DFS numbering with probability at least , must be able to distinguish with probability at least between the families of good and bad random labeled graphs and . However by Lemma 10, by setting and , the tasks of distinguishing between these families requires queries. ∎
5 Testing DFS Numbering with Queries
Our second main result shows that the lower bound in Theorem 7 is asymptotically tight.
Theorem 11.
Let . There is an algorithm that with oracle access to a labeled bounded degree undirected graph on vertices, performs queries to the oracle and accepts, if has a valid DFS numbering, and rejects with probability at least , if is -far from having a valid DFS numbering.
(One can show that our tester achieves also the same running time.)
For the rest of the section, let be a corresponding labeled graph equipped with a possibly inappropriate DFS numbering . As in Section 3, we assume that is implicit so that we can, for instance, write for instead of .
Outline.
Our approach to prove Theorem 11 is first to extend Lemma 6 (which characterizes valid DFS numberings) to describe a simple and useful property of labelings that are -far from a valid DFS numbering. In Lemma 13 in Section 5.1, we will show that if the numbering is -far from a valid DFS numbering then not only we have conflicting pairs (in the sense of Lemma 6), but in fact we have conflicting pairs that are “unrelated.” Once we have this property, the task in hand will be to detect any of such conflicting pair. We observe that there are two types of conflicting pairs, local pairs involving vertices whose DFS numbers are close to each other, and global conflicts. In order to detect local conflicts, we first develop (in Section 5.2) some basic tools to traverse a given graph following DFS numbering. Once we know how to traverse the graph, we can design an algorithm that can determine if a given pair is conflicting pair. Unfortunately, this algorithm is efficient only if vertex or is close to or (that is, if one of , , is small), and so we can use this approach to deal with local conflicts. In order to study global conflicts, we notice that if vertices and are far away from vertices and , then in fact a conflicting pair can be also extended to multiple vertices . Once we have that property, we will show that if we sample randomly vertices and edges, then if there were many global conflicts, then there would be one that is determined by one of the sampled vertices and one of the sampled edges.
5.1 Properties of labelings that are -far from any valid DFS numbering
We begin with describing a useful property of labelings that are -far from a valid DFS numbering that they have conflicting pairs that are “unrelated.” Recall that by Lemma 6 a numbering is a valid DFS numbering if and only if there is no conflicting pair with . In that case, when , we speak of a conflict involving vertex and edge . While Lemma 6 characterizes valid and invalid DFS numberings, we will need a stronger claim about properties of numberings that are -far from valid DFS numberings. For that, we need to understand numberings that are not -far from valid DFS numberings because we can modify the input graph with at most edge modifications to ensure that the resulting graph will have a valid DFS numbering.
Observe that Lemma 6 provides a simple tool to edit the input labeled graph to obtain a valid DFS numbering – to remove all conflicting pairs. With that in mind, the following claim provides a simple fix to remove all conflicts involving a specific vertex or all conflicts involving a specific edge (notice that the resulting graph may violate our degree bound , but one can address this issue using a “degree reduction framework,” see the full version.
Lemma 12.
Let and with .
-
(i)
Adding the edge to (and doing nothing if is already present) does not create any new conflicts and resolves all conflicts involving .
-
(ii)
Removing from and adding (if not already present) does not create any new conflicts and resolves all conflicts involving .
Proof.
-
(i)
After the edit we have so no can satisfy any more, meaning all conflicts involving are resolved. We do not create any new conflicts because does not change and the new edge cannot be involved in a conflict because (again) no can satisfy .
-
(ii)
Deleting clearly resolves all conflicts of this edge. However, new conflicts involving may arise since may change if we had before. By (i) we can fix these by adding .
Lemma 12 shows that adding or removing a few edges can resolve many related conflicts. We use this claim to show that in order for to be -far from having a valid DFS numbering, not only do there have to be many conflicts, but in fact there have to be many mutually unrelated conflicts. To formalize this idea we consider the bipartite conflict graph where contains an edge precisely if it is a conflicting pair. The intuition of mutually unrelated conflicts corresponds to a matching in . We have the following.
Lemma 13 (-far DFS numberings).
If is -far from having a valid DFS numbering then there is a matching of size in .
Proof.
We will prove Lemma 13 by showing that if is -far from having a valid DFS numbering then has a vertex cover of size at least , for if not, then we could modify at most edges of to make the labeling to be a valid DFS numbering of the resulting graph. Then Lemma 13 follows from König’s theorem.
Let be a maximum matching in . By Kőnig’s theorem there is a vertex cover of size , i.e., a set of vertices and edges of (vertices of ) such that for every conflict pair (for every edge of ) or . We can fix all conflicts in in edits by applying for each the fix of Lemma 12 (i) (one edit) and for each the fix of Lemma 12 (ii) (two edits). We obtain a graph with a valid DFS numbering. However, the vertices that appeared in a fix in the role of or may have degree in , higher than permitted. By our “degree reduction framework” (see full version), another edits suffices to transform into with maximum degree while maintaining the validity of the DFS numbering. Overall and since is -far we have . Hence .
Lemma 13 immediately implies a simple tester detecting -far instances: we randomly sample vertices and edges, and then with a constant probability one of the sampled vertices and one of the sampled edges will form a conflicting pair .
Corollary 14.
There is an algorithm that with oracle access to a labeled bounded degree graph on vertices, performs queries to the oracle and accepts, if has a valid DFS numbering, and rejects with probability , if is -far from having a valid DFS numbering. ∎
In what follows, we will show how to improve this result, as promised in Theorem 11.
5.2 Navigating (would-be) DFS trees
In this section, we develop tools to find conflicting pairs for vertices. We first show how to traverse a graph using consecutive labels (a simple proof is deferred to the full version).
Fact 15.
Let be an ordered tree of bounded degree and its canonical DFS ordering555The DFS starts at the root and respects the order of children when visiting them in pre-order traversal.. For a randomly selected vertex from , given oracle access to , it is possible to locate the successor (and the predecessor) of in , if one exists, with queries to in expectation.
Using Fact 15, we can prove the following lemma.
Lemma 16 (dfs-next).
There is an algorithm dfs-next that for a given vertex :
-
If the numbering of is a valid DFS numbering then either vertex is returned or end-of-component is returned if is the largest vertex in its connected component.
-
The expected (over vertices in ) number of queries to of algorithm dfs-next is .
There also exists an algorithm dfs-prev with corresponding properties.
Proof.
Let be the ordered tree on rooted at with the edges for and children ordered ascending by number. If the numbering of is a valid DFS numbering then (by Claim 4) is precisely the DFS tree that gave rise to the numbering. We can navigate to successors and predecessors in (by Fact 15) with an expected number of queries to per step. The only problem is that the virtual vertex cannot be accessed and has unbounded degree. Whenever we would have to access it, we return end-of-component instead, which is appropriate because a valid DFS backtracks to precisely if a connected component has been fully explored. (If the numbering of is invalid, the produced answer may be meaningless, but it is still obtained within the claimed expected time budget.)
5.3 Testing for conflicts
Lemma 13 implies that an -far instance has pairwise distinct vertices and edges involved in conflicts. In this section, we will extend that approach and study separately local conflicting pairs and global conflicting pairs in order to improve the tester from Corollary 14. This notion depend on a locality parameter that we will later choose as .
Definition 17.
Let with be a conflicting pair. We speak of a local conflict of the following (not mutually exclusive) types (L1) if and , (L2) if , (L3) if . In all other cases we speak of a global conflict: (G) if and ; or if .
Informally, local conflicts occur when is close (in the sense of its DFS number) to or , or is close , in which case one can traverse the input graph to efficiently detect the conflict. For global conflicts, some more global approach will be needed.
5.3.1 Testing for local conflicts
We can detect local conflicts by sampling vertices at random and walking forwards and backwards in the (supposed) DFS order using Lemma 16 as follows.
Lemma 18.
There is an algorithm that with queries in expectation accepts all valid DFS numberings and rejects with probability if there is a matching of size at least consisting of conflicting pairs of type (L1). The same applies to types (L2), (L3).
Proof.
Consider the following procedure walk-from- that is given as input. It first checks if . If so, it attempts to locate vertices using dfs-next from Lemma 16 until one of the following happens.
-
If dfs-next reaches vertices out of order, then report the error.
-
If dfs-next reaches a vertex that has a neighbour then report the error.
-
If or is reached without finding an error, then conclude that is not involved in a conflict of type (L1).
Clearly walk-from- finds an error if is involved in a conflict of type (L1) and by Lemma 16 it performs queries in expectation (over all ). Our algorithm to detect conflicts of type (L1) is now simply to repeat walk-from- for many sampled uniformly at random. Since the vertices matched in are pairwise distinct, a random vertex from is involved in a conflict of type (L1) with probability . If we sample vertices at random then the probability of sampling at least one involved in a conflict of type (L1) is
The expected total number of queries amounts to , which concludes the claim.
For conflicts of type (L2) a similar procedure walk-backwards-from- works. For conflicts of type (L3) a procedure walk-forwards-from- would not work because dfs-next might report end-of-component before is reached (and without us being aware of ’s existence). A procedure walk-backwards-from- does work, however. Note that we have to sample edges uniformly at random which is still because is constant.
5.3.2 Testing for global conflicts
Lemma 18 provides an efficient tool to detect local conflicts but for global conflicts we use a different approach. We rely on Lemma 13 that promises that there is a matching of size at least consisting of conflicting pairs. Therefore, if most of conflicts defining the matching are global, we can use the following lemma.
Lemma 19.
There is an algorithm that performs queries in expectation and accepts all valid DFS numberings and that rejects with probability at least if there is a matching of size at least consisting of conflicting pairs of type (G).
Proof.
Let us partition matching into strips according to the number of , so that . Let us define sets and . Let be the median of . Let be those vertices from that are at most and those edges from matched to a vertex that is at least . Let and note that and that .
We claim that every pair in is a conflicting pair. Indeed, let , with , and let and be the corresponding pairs in . Then we observe the following:
To see this, we notice the following:
-
(1)
is of type (G). Hence , unless ; moreover ;
-
(2)
since both and are in , we have ;
-
(3)
by the definition of and the values of and fall on the respective side of the median ;
-
(4)
is a conflicting pair.
In view of the above, since every pair in is a conflicting pair, we observe that over all we do not just have conflicting pairs to work with, but a collection of bicliques with conflicting pairs in total. If we happen to get hold of a and an for the same , then we have a conflicting pair .
In order to find a conflicting pair using the approach above, let us sample (i.u.r.) vertices from and then (i.u.r.) edges from , where will be set up momentarily.
For any , let to be the indicator random variable that the th sampled element from and the th sampled element from are (respectively) from the sets and for the same . Observe that if we define random variable as , then by the arguments above, if is positive then we have detected a conflicting pair. Using the second moment method, we get prove the following lemma.
Lemma 20.
Let for a sufficiency large constant and let . Then .
Hence, if we sample i.u.r. at least vertices from and at least edges from for a sufficiently large constant , then with a constant probability we will detect a conflicting vertex. At the same time, if the DFS numbering is valid then the algorithm will accept it. This yields Lemma 19 since in our setting and .
5.4 Putting all together: the proof of Theorem 11
Now we are ready to complete the proof of Theorem 11. Our algorithm runs the algorithms from Lemma 18 (responsible or conflicts of type (L1), (L2) and (L3)) and from Lemma 19 one after the other. If any of them rejects then we reject , otherwise we accept . The expected total running time is , which with gives as claimed. (The query complexity can be made using Markov inequality.)
Concerning correctness, it is clear that instances with valid DFS numberings are always accepted. If is -far from a valid DFS numbering then by Lemma 13 there is a matching of conflicting pairs. Since each matching edge falls into (at least) one type, at least one of the following statements holds.
-
There is a matching of conflicting pairs of type (L1).
-
There is a matching of conflicting pairs of type (L2).
-
There is a matching of conflicting pairs of type (L3).
-
There is a matching of conflicting pairs of type (G).
In each case, the corresponding algorithm rejects with probability so overall we reject with probability at least . ∎
6 Conclusions
In this paper we introduced a variant of the standard bounded-degree graph model in the property testing setting that works for labeled graphs and allows also label queries. We demonstrated the strength of the model on our new study of DFS numbering. Our main technical contribution is a tight analysis for detecting whether the input labeled graph is properly DFS-numbered or it is -far from having a valid DFS numbering. We demonstrated that this task can be solved with queries and also queries are necessary.
We observe that while our analysis is presented for undirected graphs, similar arguments hold also for directed graphs. The lower bound from Theorem 7 trivially extends to directed graphs and a careful pass through the algorithm in Theorem 11 shows that the analysis can be extended accordingly. However, to implement this algorithm efficiently in our setting, we need to allow access to incoming and outgoing edges. (See the full version of the paper.)
Our analysis can be also extended (but only for undirected graphs) to the DFS finishing numbers (FIN-numberings [5]). We can show that (for undirected graphs) a numbering is a valid FIN numbering iff the reverse numbering with is a valid DFS numbering. This immediately implies that our results for testing valid DFS numberings extend to testing valid FIN numberings in undirected graphs.
References
- [1] Isolde Adler, Noleen Köhler, and Pan Peng. On testability of first-order properties in bounded-degree graphs and connections to proximity-oblivious testing. SIAM Journal on Computing, 53(4):825–883, 2024. doi:10.1137/23M1556253.
- [2] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A combinatorial characterization of the testable graph properties: It’s all about regularity. SIAM Journal on Computing, 39(1):143–167, 2009. doi:10.1137/060667177.
- [3] Itai Benjamini, Oded Schramm, and Asaf Shapira. Every minor-closed property of sparse graphs is testable. Advances in Mathematics, 223(6):2200–2218, 2010.
- [4] Arnab Bhattacharyya and Yuichi Yoshida. Property Testing – Problems and Techniques. Springer Verlag, 2022. doi:10.1007/978-981-16-8622-1.
- [5] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 4th Edition. MIT Press and McGraw-Hill, 2022.
- [6] Artur Czumaj, Oded Goldreich, Dana Ron, C. Seshadhri, Asaf Shapira, and Christian Sohler. Finding cycles and trees in sublinear time. Random Structures & Algorithms, 45(2):139–184, 2014. doi:10.1002/RSA.20462.
- [7] Artur Czumaj, Asaf Shapira, and Christian Sohler. Testing hereditary properties of nonexpanding bounded-degree graphs. SIAM Journal on Computing, 38(6):2499–2510, 2009. doi:10.1137/070681831.
- [8] Hendrik Fichtenberger, Pan Peng, and Christian Sohler. Every testable (infinite) property of bounded-degree graphs contains an infinite hyperfinite subproperty. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 714–726, 2019. doi:10.1137/1.9781611975482.45.
- [9] Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. doi:10.1017/9781108135252.
- [10] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653–750, 1998. doi:10.1145/285055.285060.
- [11] Oded Goldreich and Dana Ron. A sublinear bipartiteness tester for bunded degree graphs. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC), pages 289–298, 1998. doi:10.1145/276698.276767.
- [12] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. Algorithmica, 32(2):302–343, 2002. doi:10.1007/S00453-001-0078-7.
- [13] Oded Goldreich and Dana Ron. On proximity-oblivious testing. SIAM Journal on Computing, 40(2):534–566, 2011. doi:10.1137/100789646.
- [14] Avinatan Hassidim, Jonathan A. Kelner, Huy N. Nguyen, and Krzysztof Onak. Local graph partitions for approximation and testing. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 22–31, 2009. doi:10.1109/FOCS.2009.77.
- [15] John E. Hopcroft and Robert E. Tarjan. Efficient planarity testing. Journal of the ACM, 21(4):549–568, 1974. doi:10.1145/321850.321852.
- [16] Jon M. Kleinberg and Éva Tardos. Algorithm Design. Addison-Wesley, 2013.
- [17] Édouard Lucas. Récreations Mathématiques. Librairie Albert Banchard, Paris, 1882.
- [18] Ilan Newman and Christian Sohler. Every property of hyperfinite graphs is testable. SIAM Journal on Computing, 42(3):1095–1112, 2013. doi:10.1137/120890946.
- [19] Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam D. Smith. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813–842, 2009. doi:10.1137/070701649.
- [20] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252–271, 1996. doi:10.1137/S0097539793255151.
- [21] Robert Endre Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146–160, 1972. doi:10.1137/0201010.
- [22] Robert Endre Tarjan. Edge-disjoint spanning trees and depth-first search. Acta Informatica, 6:171–185, 1976. doi:10.1007/BF00268499.
- [23] G. Tarry. Le problème des labyrinthes. Nouvelles Annales de Mathématiques, 3e série, 14:187–190, 1895.
