Graph Reconstruction via MIS Queries
Abstract
In the Graph Reconstruction (GR) problem, a player initially only knows the vertex set of an input graph and is required to learn its set of edges . To this end, the player submits queries to an oracle and must deduce from the oracle’s answers.
Angluin and Chen [Journal of Computer and System Sciences, 2008] resolved the number of Independent Set (IS) queries necessary and sufficient for GR on -edge graphs. In this setting, each query consists of a subset of vertices , and the oracle responds with a boolean, indicating whether is an independent set in . They gave algorithms that use IS queries, which is best possible.
In this paper, we initiate the study of GR via Maximal Independent Set (MIS) queries, a more powerful variant of IS queries. Given a query , the oracle responds with any, potentially adversarially chosen, maximal independent set in the induced subgraph .
We show that, for GR, MIS queries are strictly more powerful than IS queries when parametrized by the maximum degree of the input graph. We give tight (up to poly-logarithmic factors) upper and lower bounds for this problem:
-
1.
We observe that the simple strategy of taking uniform independent random samples of and submitting those to the oracle yields a non-adaptive randomized algorithm that executes queries and succeeds with high probability. This should be contrasted with the fact that IS queries are required for such graphs, which shows that MIS queries are strictly more powerful than IS queries. Interestingly, combining the strategy of taking uniform random samples of with the probabilistic method, we show the existence of a deterministic non-adaptive algorithm that executes queries.
-
2.
Regarding lower bounds, we prove that the additional factor when going from randomized non-adaptive algorithms to deterministic non-adaptive algorithms is necessary. We show that every non-adaptive deterministic algorithm requires queries. For arbitrary randomized adaptive algorithms, we show that queries are necessary in graphs of maximum degree , and that queries are necessary, even when the input graph is an -vertex cycle.
Keywords and phrases:
Query Complexity, Graph Reconstruction, Maximal Independent Set QueriesCopyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
Raghu MekaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Query algorithms for graph problems have recently received significant attention. In this setting, algorithms are granted access to the input graph solely via a (usually easy-to-compute) subroutine, which is referred to as the oracle, and the complexity of an algorithm is measured by the number of subroutine/oracle calls.
In the literature, a large number of different query models have been considered. Queries can either be local or global, depending on whether they reveal only local information, e.g., vertex degree queries [13] or queries that, for any , reveal the th neighbour of a vertex [17], or global information, e.g., (bipartite) independent set queries [7, 10, 1, 3] or maximal matching queries [12, 20]. Depending on the application, it is generally desirable to obtain non-adaptive query algorithms, i.e., algorithms where the different queries do not depend on each other’s outcomes, since such queries can be executed simultaneously and therefore admit straightforward parallel implementations. We also say that a query algorithm requires rounds of adaptivity, for some integer , if the queries executed by the algorithm can be partitioned into groups such that the queries in the th group only depend on the outcomes of the queries in groups .
Graph Reconstruction.
In this work, we consider query algorithms for the Graph Reconstruction (GR) problem. In GR, a player initially only knows the vertex set of a graph and is tasked with learning the edge set , by submitting a sequence of adaptive or non-adaptive queries to the oracle, and by deducing the edge set from the oracle’s answers.
Query algorithms for GR have been extensively studied under various query models, including Independent Set (IS) queries [18, 11, 6, 5, 7, 1], Distance queries [24, 19, 21, 25], Betweenness queries [2, 26], and in the setting where an algorithm receives random vertex-induced subgraphs or submatrices of the adjacency matrix [22]. Very relevant to our work are IS queries, where a query consists of a subset of vertices , and the oracle returns a boolean, indicating whether is an independent set in . IS queries for GR were originally studied for recovering simple graphs, such as matchings and Hamiltonian cycles [18, 11, 6], since these settings have direct applications to genome sequencing. Angluin and Chen [7] were the first to consider general graphs and showed that -edge graphs can be reconstructed with IS queries via an adaptive deterministic algorithm or a randomized algorithm with limited adaptivity (see also [1] who reduced the number of adaptive rounds), and this is also best possible.
MIS Queries.
In this work, we initiate the study of GR under Maximal Independent Set (MIS) queries. In this setting, similar to IS queries, a query also consists of a subset of vertices . The oracle, however, responds with any, potentially adversarially chosen, maximal independent set in the subgraph induced by the queried vertices .
MIS queries are global queries and are similar in spirit to the Maximal Matching queries considered in [12, 20], where the oracle returns a maximal matching in the subgraph induced by the query vertices [12], or in the subgraph spanned by the queried edges that are also contained in the input graph [20]. They are at least as powerful as IS queries since an IS query can be answered by an MIS oracle by exploiting the connection:
We observe that computing an MIS is a simple task that can be solved in linear time by a Greedy algorithm and even in sublinear time on graphs of bounded-neighborhood independence [9]. The MIS problem can also be solved efficiently in many restricted computational models, e.g., an MIS can be computed in rounds in the Congested-Clique model [16], in rounds in the LOCAL model of distributed computing [15, 27], and in passes in the semi-streaming model [4, 8]. This task can therefore be regarded as a building block of more elaborate algorithms and may be outsourced to an oracle that maintains an efficient implementation.
Our key motivation is to answer the question as to whether MIS queries are strictly more powerful than IS queries, i.e., whether enhancing the oracle answer by providing a maximal independent set rather than just indicating whether the query set is independent yields significant savings in the number of queries required.
1.1 Our Results
In this paper, we show that MIS queries can be strictly more powerful than IS queries, but this is also not always the case.
As previously mentioned, Angluin and Chen showed that IS queries are sufficient for reconstructing a graph on edges. One of our lower bound results (Theorem 14) implies that there are -edge graphs that require MIS queries to be reconstructed, which show that, up to a logarithmic factor, MIS queries are not more powerful than IS queries on the class of -edge graphs. However, when considering the class of graphs with maximum degree , for some integer , significant savings can be achieved:
| Algorithm | Lower Bound | |
|---|---|---|
| Randomized | (Theorem 1) | (Theorems 13 and 14) |
| Deterministic | (Corollary 7) | (Corollary 12) |
Algorithms.
We give a randomized algorithm for reconstructing an -vertex graph with maximum degree that uses non-adaptive queries and succeeds with high probability111As it is standard, we say that an event related to the input graph occurs with high probability if the probability is , where . (Theorem 1). This result shows that, for the class of graphs of maximum degree , MIS queries are stronger than IS queries since an information-theoretic lower bound similar to the one given in [7] shows that IS queries are needed to reconstruct a graph with maximum degree (Corollary 16 in Appendix A). Observe that, for the class of constant degree graphs, MIS queries are sufficient, but IS queries are necessary.
We also investigate whether randomization is necessary to obtain algorithms with low query complexity. Using the probabilistic method, we show that there exists a non-adaptive deterministic query algorithm that executes queries (Corollary 7).
Our non-adaptive randomized and deterministic algorithms assume that the maximum degree is known in advance. This cannot be avoided since, as proved in Theorem 14, any algorithm that executes queries cannot solve all instances with . We show that the non-adaptive algorithms above can be turned into adaptive algorithms with query complexities that are by at most a constant factor larger. These adaptive algorithms do not require any information about in order to operate and only require rounds of adaptivity (Corollary 9).
Lower Bounds.
We give lower bounds on the number of queries required by both non-adaptive deterministic algorithms and adaptive randomized algorithms.
First, we show that every non-adaptive deterministic query algorithm requires queries (Corollary 12), which renders our deterministic algorithm optimal, up to poly-logarithmic factors. This result together with our non-adaptive randomized algorithm (Theorem 1) establishes a separation result between non-adaptive randomized and non-adaptive deterministic algorithms since our randomized algorithm only requires queries.
Next, we show that every adaptive randomized query algorithm requires queries, even if the input graph is guaranteed to be an -vertex cycle (Theorem 13). Furthermore, we show that every adaptive randomized query algorithm requires queries, for any (Theorem 14). These lower bounds show that the number of queries executed by our randomized algorithm is optimal, up to a logarithmic factor, and that the logarithmic dependency on cannot be avoided entirely.
For an overview of our results, see Table 1.
1.2 Techniques
Algorithms.
The starting point for all our algorithms is an algorithm by Angluin and Chen [7] for GR that uses IS queries. Our randomized algorithm is in fact identical to their algorithm, up to the use of a different sampling probability. It works as follows:
The key idea is to learn all the non-edges rather than all the edges of the input graph. To this end, we sample sufficiently many random vertex-induced subgraphs , for integers , such that, for every non-edge , the probability that both and are contained in and are isolated in is . This is achieved by including every vertex in with probability . Observe that when and are isolated in then the oracle necessarily needs to include both and in the returned maximal independent set. The returned maximal independent set therefore serves as a witness that proves that the potential edge is not contained in the input graph. We then argue that, after repeating this process times, we have learnt all non-edges with high probability, which allows us to identify the edges of the input graph by complementing the set of non-edges.
Angluin and Chen use this idea in the context of IS queries for graphs with maximum degree , where the significantly lower inclusion probability of each vertex into each sample of needs to be used.
Our deterministic algorithm is built on a similar idea. We show that, when taking random subsets as above, i.e., by inserting every vertex into any subset with probability , then, for any tuple of disjoint vertices, which we also refer to as a witness, there exists a set such that but with positive probability. Since this event happens with non-zero probability, such a family of subsets exists. Our deterministic algorithm then queries all of these subsets . We now claim that our algorithm learns every non-edge : We know that there exists a set such that but since holds. The vertices and are thus isolated in and therefore necessarily reported in the oracle answer, which provides a proof to the algorithm that the edge is not contained in the input graph.
Both our non-adaptive randomized and deterministic algorithms require advanced knowledge of , which, as previously argued, is unavoidable. We turn both of these algorithms into adaptive algorithms that require only rounds of adaptivity and whose total number of queries executed is only by a constant factor larger. This is achieved by successively running our algorithms for the guesses for the maximum degree until a final guess is used, in which case it is easy to see that the graph is correctly reconstructed. While this is a standard doubling argument, the non-trivial part of the argument is to identify when the condition is reached since is now unknown. Denoting by the set of non-edges learnt by the execution of one of our algorithms when invoked on the current guess , we show that if the maximum degree in the graph spanned by the edges is at least then we have indeed learnt all the non-edge of the input graph, and thus also reconstructed the graph.
Lower Bounds.
We first discuss our lower bound for deterministic non-adaptive algorithms. Witnesses, i.e., tuples of disjoint vertices, play a key role in our lower bound argument. We argue that any deterministic non-adaptive query algorithm must be such that, for every witness of disjoint vertices, there exists a query such that and . We call a set of queries that fulfills this property a -Query-Scheme of size . To see that the previous property is true, for the sake of a contradiction, suppose that this is not the case and there exists a witness that does not fulfill these properties, i.e., whenever is included in a query then at least one of the vertices is included in this query as well. We claim that the two graphs and , where, in both graphs, are incident on and are incident on , and is an edge in but not in cannot be distinguished222This construction generates a maximum degree of in , which is technically not allowed since we consider algorithms that run on graphs of maximum degree . To circumvent this issue, in our actual proof we therefore relate algorithms that operate on maximum degree graphs to -Query-Schemes.. Indeed, we claim that, for any query executed, the oracle can always report an independent set that contains at most one of the two vertices , even on graph . This is because whenever is included in a query then at least one of the vertices is also included in this query. The oracle can therefore include in the oracle answer, which implies that at most one of and will also be included. Both graphs and are therefore consistent with all oracle answers and are thus indistinguishable, a contradiction.
Our task, therefore, is to prove that any -Query-Scheme must be of size at least . We achieve this by combining two separate arguments that address small queries, i.e., queries of size at most and large queries of size larger than , respectively: For any pair of vertices , small queries are such that they cover many witnesses , for many different vertices , since the vertices are chosen from , however, on average, vertex pairs cannot be included in many small queries (at most ). In contrast, vertex pairs can be included in many large queries, however, those queries do not cover many witnesses , for vertices . We obtain our result by combining these observations, see the proof of Lemma 10.
Next, we discuss our queries lower bound for adaptive randomized algorithms on -vertex cycles. Observe that, in an -vertex cycle , every pair of vertices is such that . Hence, for a query algorithm to be able to distinguish any two vertices of the input graph, the algorithm is required to obtain oracle answer maximal independent sets such that and do not behave the same in all returned independent sets. We show that queries are needed to distinguish between all the vertices. Our argument is based on the observation that, given a set of vertices that are so far indistinguishable and a query set , the returned maximal independent set only allows us to differentiate between the vertices of that are not queried, queried and contained in , and queried and not contained in . Every query thus allows us to partition any set of vertices that is still indistinguishable into only three subsets, which implies that queries are needed to distinguish between all the vertices.
Last, we discuss our queries lower bound for arbitrary randomized adaptive algorithms. We work with the family of graphs obtained from the family of all balanced bipartite graphs with , where the two bipartitions and are turned into two separate cliques. Observe that each such graph has a maximum independent set size of . We now claim that, for every non-edge in the input graph , there must exist an independent set returned by the oracle such that . This needs to be the case since otherwise the algorithm cannot distinguish between and the graph , i.e., with the edge added, since all query responses are then consistent with both and . However, since at most one non-edge can be learnt per query, and contains many graphs with non-edges, the result follows. Our actual lower bound is proved for randomized algorithms via an application of Yao’s Lemma, which makes the argument slightly more complicated.
1.3 Recent Developments
Since a preprint of our work was released on arXiv in early 2024, Michael and Scott [23] improved some of our lower bounds by poly-logarithmic factors. They show that, for arbitrary adaptive randomized algorithms, queries are needed, improving our lower bound by a factor. They further strengthen this bound to for non-adaptive randomized algorithms. For deterministic non-adaptive algorithms, they give a lower bound, improving over our lower bound.
Their improvements for randomized algorithms are obtained by using different and more involved graph constructions. Their lower bound for non-adaptive deterministic algorithms is obtained by establishing a connection between what we refer to as -Query-Schemes (which are equivalent to non-adaptive deterministic algorithms) and cover-free families, which are well-studied mathematical objects. They give an improved lower bound for a cover-free family with certain parameters, which, by the novel connection identified, translates to a lower bound for deterministic non-adaptive algorithms.
1.4 Outline
2 Algorithms
We give our randomized non-adaptive algorithm that executes queries in Subsection 2.1, and our deterministic non-adaptive algorithm that executes queries in Subsection 2.2. Both these algorithms require the maximum degree as part of their inputs. In Subsection 2.3, we show how these algorithms can be turned into adaptive algorithms with a similar number of queries that do not require advanced knowledge of .
2.1 Randomized Algorithm
Our algorithm, Algorithm 1, executes queries on random subsets , where each vertex is included in with probability , and outputs every pair as an edge of the input graph if is not contained in any maximal independent set , for all , returned by the oracle.
In the analysis, we prove that, for every non-edge in the input graph , both and are reported in any independent set with probability . Since independent sets are computed, any non-edge will be detected with high probability, and by the union bound, this then applies to all non-edges.
Theorem 1.
Algorithm 1 is a non-adaptive randomized algorithm that executes MIS queries and correctly reconstructs a graph of maximum degree with high probability.
Proof.
Let denote the input graph, the maximum degree, and let denote the set of non-edges. We will prove that, with high probability, every non-edge can be identified by the algorithm in that, for a non-edge , there exists an independent set such that with high probability.
Consider thus any non-edge . Then, for every , the probability that both and are reported in independent set is at least:
since both and need to be included in a maximal independent set if they are isolated vertices in .
Next, observe that the events “” and “” are independent since and are not adjacent and are thus not contained in . Furthermore, observe that . Then:
| (1) |
where we used the bound , which holds for every .
Next, we compute the probability that both endpoints of the non-edge are never reported together:
where we used the inequality , and the last inequality holds when is chosen to be large enough.
By the union bound, the probability that the endpoints of at least one of the non-edges are not reported in any independent set is therefore at most , which completes the proof.
2.2 Deterministic Algorithm
Central to our deterministic algorithm is the notion of a -Query-Scheme and a Witness:
Definition 2 (Witness).
Let be a set of vertices and let be an integer. Then, the tuple with being distinct vertices is called a Witness, and we denote by the set of all witnesses.
As it will be important in the proof of Lemma 6, we give an upper bound on the number of witnesses:
Lemma 3.
The number of witnesses is bounded by:
Proof.
We use the bound and obtain:
Definition 4 (-Query-Scheme).
Let be a set of vertices and let be an integer. The set is denoted a -Query-Scheme of size if, for every witness , there exists a query such that:
-
1.
, and
-
2.
.
In the following, we say that a query considers a witness if Items 1 and 2 hold for and .
We show in Lemma 5 that a -Query-Scheme of size immediately yields a non-adaptive deterministic query algorithm for GR for graphs of maximum degree that executes queries. Our task is thus to design a -Query-Scheme of small size, which we do in the proof of Lemma 6.
Lemma 5.
Let be a -Query-Scheme of size . Then, there exists a non-adaptive deterministic algorithm for GR that executes queries on graphs of maximum degree .
Proof.
Let be the input graph, and let be a -Query-Scheme of size . The algorithm executes every query . Let denote the query answer to .
We now claim that, for every non-edge in the input graph, there exists an independent set such that . To see this, denote by the neighbours of in and by the neighbours of in . Then, since is a -Query-Scheme, there exists a query such that , but none of ’s and ’s neighbours are included. Hence, both and are necessarily included in and the algorithm therefore observes a witness that proves that the edge does not exist in the input graph.
Since the argument applies to every non-edge, the algorithm learns all non-edges of the input graph and thus also learns all of the input graph’s edges by complementing the set of non-edges.
Lemma 6.
There exists a -Query-Scheme of size .
Proof.
For an integer whose value we will determine later, let be such that, for every , is the subset of obtained by including every vertex with probability . We use the probabilistic method and prove that is a -Query-Scheme with positive probability, which in turn implies that such a scheme exists.
To this end, let be distinct vertices. Then, for any , we obtain (the derivation is identical to Inequality 1 and therefore not repeated here)
Furthermore, the probability that there does not exist a query , for any , such that and is at most:
Hence, by the union bound over all witnesses (see Lemma 3), the probability that there exists a witness that is not considered by the -Query-Scheme is at most:
For this probability to be strictly below , it is enough to set
which in turn implies that such a scheme exists.
Corollary 7.
There exists a deterministic algorithm for GR that executes non-adaptive MIS queries for graphs with maximum degree .
2.3 Adaptive Algorithms without Knowledge of
We will now show how the randomized and deterministic algorithms from the previous sections can be turned into adaptive algorithms with similar query complexity that do not require knowledge of in order to operate. More specifically, let be a non-adaptive query algorithm that, given an integer , identifies with high probability in rounds all non-edges such that and hold. It immediately follows from our analyses that both our randomized and deterministic query algorithms have this property when executed with parameter instead of the true value of . For our randomized algorithm, we have , and, for our deterministic algorithm, we have . We will show how can be turned into an adaptive algorithm that does not require knowledge of and requires overall rounds.
This is achieved via the doubling strategy displayed in Algorithm 2.
The algorithm uses the notation , which is to be interpreted as the maximum degree in the graph spanned by the edges .
Lemma 8.
Let be a non-adaptive MIS-query algorithm that, given an integer , in rounds identifies with high probability all non-edges of the input graph that have the property that and . We assume that is at least linear in , i.e., . Then, there exists an adaptive algorithm for GR that succeeds with high probability, does not require advanced knowledge of , runs in rounds, and requires rounds of adaptivity.
Proof.
Let denote the input graph and the maximum degree.
We will argue that when exiting the last iteration of the while loop of the algorithm, the inequalities
| (2) |
hold. The lower bound establishes correctness since the last run of is executed with a guess , which implies that all non-edges are correctly identified. The upper bound is required in order to bound the query complexity of the algorithm.
We first argue the lower bound in Inequality 2. Observe that, in any iteration of the loop, holds since is a subset of the non-edges and . Thus, we have . When exiting the last iteration of the algorithm, we have since otherwise this would not be the last iteration of the algorithm. Combining these two inequalities yields .
Regarding the upper bound, we have just proved that, when exiting the last iteration, holds. It is then also clear that a run with is executed since the guess is doubled in each iteration. We claim that this run with is indeed the last iteration of the algorithm. Indeed, in this iteration, all non-edges are correctly identified since , which implies that constitutes the edge set of the input graph. This further implies that . The condition in the while loop then ensures that this is indeed the last iteration of the algorithm.
Last, regarding the runtime of the algorithm, let be the guess used in the last iteration. We then have that . Then, under the assumption that , we have:
which establishes the query complexity of the algorithm. Last, since the while loop is executed times, the algorithm requires only rounds of adaptivity.
The previous lemma together with our randomized and deterministic algorithms from the previous sections yield the following corollary.
Corollary 9.
There are randomized and deterministic adaptive MIS-query algorithms for GR that execute and queries, respectively, require rounds of adaptivity, and do not require advanced knowledge of in order to operate. The randomized algorithm succeeds with high probability.
3 Lower Bounds
In this section, we give three lower bound results. First, in Subsection 3.1, we consider the class of non-adaptive deterministic query algorithms and we prove that such algorithms require queries. This result renders our deterministic query algorithm optimal, up to poly-logarithmic factors, and it also establishes a separation result between deterministic and randomized query algorithms, since, as demonstrated by our randomized query algorithm, non-adaptive randomized queries are sufficient.
Next, in Subsection 3.2, we show that queries are needed for query algorithms that may be adaptive and randomized, and that queries are needed for such algorithms, even if the input graph is an -vertex cycle.
3.1 Lower Bound for Non-adaptive Deterministic Algorithms
We will first show in Lemma 10 that any -Query-Scheme must be of size at least . Then, we argue in Lemma 11 that the queries executed by any non-adaptive deterministic query algorithm must constitute a -Query-Scheme. These two lemmas together then imply our main result of this section as stated in Corollary 12, i.e., that non-adaptive deterministic query algorithms require queries.
Lemma 10.
For every , every -Query-Scheme is of size .
Proof.
Let be a suitably large constant, and suppose that there exists a -Query-Scheme of size . We will show by contradiction that a -Query-Scheme of this size does not exist. Furthermore, we define a relevant query size threshold by .
Given , we will first argue that there exists a pair of disjoint vertices such that there is no query with , and there are at most queries of size at most that contain both . Then, once we have established that such a pair exists then we use the probabilistic method to show that there exist disjoint vertices different to such that the witness is not considered in , a contradiction to the fact that is a -Query-Scheme. This implies that a query scheme of size cannot exist.
Denote by the set of subsets of of size , i.e., , where denotes the set of vertices of the input graph, and observe that .
First, observe that at most queries in are of size exactly . Hence, as long as , at most a -fraction of is part of queries of size . Observe that, since the statement of the lemma assumes that , we have
where we assumed that is large enough, and the last inequality holds for every . Observe that a query of size immediately considers all witnesses of the form , for any vertices . The previous argument shows that this can happen for at most a -fraction of pairs .
Next, we argue that at least a -fraction of the pairs are such that is contained in at most queries of size at most in . To this end, observe that any query of size at most contains at most distinct pairs , and since there are overall queries, at most
pairs appear overall in all queries of size at most . If less than a -fraction of pairs were contained in at most queries of size at most in , then at least a -fraction of pairs were contained in more than queries of size at most . But this implies that the following inequality must hold:
This inequality, however, does not hold for , a contradiction. Hence, we have proved that at least a -fraction of pairs are such that is contained in at most queries of size at most in .
Consider thus a pair that is included in at most queries of size at most , and that is not included in a query of size . The arguments above ensure that such a pair exists. Denote by the set of queries that contain both and , and suppose that are the queries of size at most (which implies ). We now claim that there is a witness , for some vertices , that is not considered by the queries , contradicting the assumption that is a -Query-Scheme, which then completes the proof.
The witness is constructed as follows. For , let be any element. Recall that is not contained in any query of size exactly , hence, is well-defined. Furthermore, let . We observe that the elements are not necessarily disjoint, and, hence, . Next, let be a random subset of size . Then, our witness is obtained as , and we will prove in the following that, with positive probability, this witness is indeed not considered by . This implies that there exists a witness that is not considered by , which completes the proof.
Consider now any query , for . Then, the probability that considers is bounded as follows:
Since is a large enough constant, by the union bound, the probability that at least one query of the queries considers is therefore strictly below (recall that there are at most queries, and holds, for large enough ). Hence, there exists a witness that is not considered by these queries, which completes the proof.
Lemma 11.
Let be a non-adaptive deterministic query algorithm for GR on graphs of maximum degree . Then, the queries executed by form a -Query-Scheme.
Proof.
For the sake of a contradiction, suppose that there exists a sequence of queries that does not form a -Query-Scheme but still allows the algorithm to learn the input graph exactly. Since the queries do not form a -Query-Scheme, there exists a witness that is not considered by the queries. Consider now any two input graphs and of maximum degree that have the following properties: ’s neighbours are , and ’s neighbours are . In , there is also an edge between and , and in there is no edge between and . Observe that the degrees of and in and are and , respectively. The maximum degrees in and are therefore at most .
Now, we claim that, for every query , the oracle can respond with an independent set that does not include both vertices and . Observe that the algorithm therefore cannot learn whether the edge exists since both and are consistent with all query answers. The algorithm therefore cannot distinguish between the two graphs , which then completes the argument.
Since does not consider , there exists a vertex such that . Hence, the oracle can construct an independent set starting with vertex (e.g., by running the Greedy maximal independent set algorithm where is the first vertex picked), which implies that either or cannot be included in the independent set. This completes the proof.
Remark.
The proof of the previous lemma assumes that the oracle can identify a witness not considered by the queries submitted by the algorithm. This is only possible if all queries are submitted simultaneously to the oracle. Observe that this is a valid assumption since we consider the class of non-adaptive algorithms, and such algorithms equally work when all queries are submitted simultaneously.
Combining Lemma 10 with Lemma 11, we obtain the main lower bound result of this section as a corollary:
Corollary 12.
Every deterministic non-adaptive query algorithm for GR requires queries.
3.2 Lower Bounds for Adaptive Randomized Algorithms
We first prove that, even on an -vertex cycle, any randomized adaptive algorithm requires queries to solve GR. Our proof is based on an indistinguishability argument: At least queries are needed so that, for each pair of vertices , different outcomes from the oracle are observed, which is needed to distinguish all possible cycles from each other.
Theorem 13.
Every possibly randomized query algorithm with success probability strictly above for Graph Reconstruction using a Maximal Independent Set oracle on an -vertex cycle requires queries.
Proof.
Denote by the family of -vertex cycles on the vertex set with , which serve as the input to our algorithm.
Let be a randomized query algorithm that executes rounds and that, on any input , succeeds with probability strictly above . Then, by Yao’s lemma, there exists a deterministic query algorithm that succeeds on strictly more than half of the inputs in and also runs in rounds.
We first observe that, since is deterministic, and we also assume that the oracle answers are deterministic, for every , there exists a unique execution of , i.e., a sequence of query vertices , query answers , and output produced by the algorithm. We observe that may be different from if the algorithm errs on input .
Denote by the subset of inputs on which the algorithm succeeds, and let denote the inputs on which the algorithm fails. We will now argue that, for every input on which the algorithm succeeds, there exists a unique input on which the algorithm fails, i.e., there exists an injective mapping from into , which implies that . This is a contradiction to the fact that is correct on strictly more than half of the instances in , which in turn implies that the algorithm does not exist.
Let be an input on which succeeds. Denote by the vertices queried by the algorithm in the query rounds and by the query responses. We associate the following complete ternary tree with layers to the query vertices and responses in an execution of :
-
The root (layer 1) is labelled with .
-
For an internal node in layer with label , the node has three children with labels such that , and
queried and reported queried and not reported not queried
Denote by the leaves of from left-to-right. We observe that, by construction, the leaves are disjoint and their union equals , i.e., . Consider now a leaf that contains at least two vertices . The vertices are indistinguishable as they behaved exactly the same throughout the execution of the algorithm. Indistinguishable here means that the cycle obtained from the cycle by swapping the positions of and leads to the exact same execution of the algorithm as the execution for . The algorithm, however, fails on as the output produced is . Hence, as long as there exists a leaf that contains at least two vertices, we can identify an input on which the algorithm fails and establish our injective mapping from to . To avoid that there are no leaves that contain two vertices, the number of leaves in must be at least . Since is ternary and of depth , we obtain that must hold, a contradiction to the assumption that . This completes the proof.
Last, we give our queries lower bound for graphs of maximum degree . The key observation in our proof is that, for every non-edge in the input graph , there must exist an oracle response maximal independent set that contains both vertices and , since, if the opposite was true then the algorithm could not distinguish between the input graph and the graph .
Theorem 14.
For any , every possibly randomized query algorithm with success probability strictly greater than for Graph Reconstruction using a Maximal Independent Set oracle requires queries on graphs of maximum degree .
Proof.
For integers , let denote the set of all bipartite graphs with . Then, let be the family of graphs obtained from by turning the bipartitions and of each of its graphs into (disjoint) cliques.
Let denote a randomized query algorithm that succeeds with probability strictly above on each input , and for the sake of a contradiction, we assume that executes at most queries. Then, by Yao’s lemma, there exists a deterministic query algorithm that also executes at most queries and succeeds on strictly more than half of the inputs in .
First, observe that , for every since at most one vertex from and at most one vertex from can be included in any independent set. Hence, every independent set reported by on any of the input graphs is of size at most .
Denote by the subset of inputs on which succeeds, and let . We will show that there exists an injective map from to , which implies that , a contradiction to the fact that succeeds on strictly more than half of the instances. This in turn implies that algorithm does not exists, which establishes the theorem.
To this end, let be any instance and consider the execution of on , i.e., let denote the queries submitted in the query rounds, let denote the query responses, and let denote the output produced by the algorithm.
Let
i.e., the set of pairs of vertices that do not constitute a response from the oracle. We observe that every graph obtained from by flipping the edge , i.e., by introducing in in case it is not contained in or by removing it from in case it is contained in , leads to the exact same execution of as all oracle answers are consistent with both and . The algorithm however fails on since the answer produced by the execution is . Hence, as long as , we can identify a graph on which the algorithm fails and map the input to in our injective map. To ensure that , we require at least queries. This however is a contradiction to the assumption that only queries are executed.
Last, the result follows by observing that the maximum degree of any graph in is , which implies that at least query rounds are required. We remark that Theorem 14 also shows that queries are needed for reconstructing graphs on edges. Observe that every graph of the family of graphs used in Theorem 14 has edges.
4 Conclusion
In this paper, we initiated the study of the GR problem using an MIS oracle. We gave a non-adaptive randomized algorithm that reconstructs a graph with maximum degree using queries, and a non-adaptive deterministic query algorithm that uses queries. Both these algorithms require advanced knowledge of in order to operate, which is unavoidable. We showed that both algorithms can be turned into adaptive algorithms with rounds of adaptivity and a similar number of queries that do not require advanced knowledge of . We also proved that, for adaptive randomized algorithms, queries are necessary, and that such algorithms require queries even if the input graph is an -vertex cycle. Furthermore, we showed that non-adaptive deterministic query algorithms require queries, which renders our deterministic algorithm optimal up to poly-log factors.
While, up to lower order terms, the MIS query complexity of GR is settled for randomized algorithms and for non-adaptive deterministic algorithms, it is unclear whether there are adaptive deterministic algorithms that are stronger than non-adaptive deterministic algorithms. More concretely, is there an adaptive deterministic query algorithm that requires fewer than queries?
References
- [1] Hasan Abasi and Nader H. Bshouty. On learning graphs with edge-detecting queries. In Aurélien Garivier and Satyen Kale, editors, Algorithmic Learning Theory, ALT 2019, 22-24 March 2019, Chicago, Illinois, USA, volume 98 of Proceedings of Machine Learning Research, pages 3–30. PMLR, 2019. URL: http://proceedings.mlr.press/v98/abasi19a.html.
- [2] Mikkel Abrahamsen, Greg Bodwin, Eva Rotenberg, and Morten Stöckel. Graph reconstruction with a betweenness oracle. In Nicolas Ollinger and Heribert Vollmer, editors, 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17-20, 2016, Orléans, France, volume 47 of LIPIcs, pages 5:1–5:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.STACS.2016.5.
- [3] Raghavendra Addanki, Andrew McGregor, and Cameron Musco. Non-adaptive edge counting and sampling via bipartite independent set queries. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pages 2:1–2:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ESA.2022.2.
- [4] Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 2237–2246. JMLR.org, 2015. URL: http://proceedings.mlr.press/v37/ahn15.html.
- [5] Noga Alon and Vera Asodi. Learning a hidden subgraph. SIAM J. Discret. Math., 18(4):697–712, 2005. doi:10.1137/S0895480103431071.
- [6] Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, and Benny Sudakov. Learning a hidden matching. SIAM J. Comput., 33(2):487–501, 2004. doi:10.1137/S0097539702420139.
- [7] Dana Angluin and Jiang Chen. Learning a hidden graph using o(logn) queries per edge. J. Comput. Syst. Sci., 74(4):546–556, 2008. doi:10.1016/J.JCSS.2007.06.006.
- [8] Sepehr Assadi, Christian Konrad, Kheeran K. Naidu, and Janani Sundaresan. O(log log n) passes is optimal for semi-streaming maximal independent set. In Bojan Mohar, Igor Shinkar, and Ryan O’Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 847–858. ACM, 2024. doi:10.1145/3618260.3649763.
- [9] Sepehr Assadi and Shay Solomon. When algorithms for maximal independent set and maximal matching run in sublinear time. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 17:1–17:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.ICALP.2019.17.
- [10] Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, and Makrand Sinha. Edge estimation with independent set oracles. In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, volume 94 of LIPIcs, pages 38:1–38:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.ITCS.2018.38.
- [11] Richard Beigel, Noga Alon, Simon Kasif, Mehmet Serkan Apaydin, and Lance Fortnow. An optimal procedure for gap closing in whole genome shotgun sequencing. In Thomas Lengauer, editor, Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB 2001, Montréal, Québec, Canada, April 22-25, 2001, pages 22–30. ACM, 2001. doi:10.1145/369133.369152.
- [12] Lidiya Khalidah binti Khalil and Christian Konrad. Constructing large matchings via query access to a maximal matching oracle. In Nitin Saxena and Sunil Simon, editors, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2020, December 14-18, 2020, BITS Pilani, K K Birla Goa Campus, Goa, India (Virtual Conference), volume 182 of LIPIcs, pages 26:1–26:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.FSTTCS.2020.26.
- [13] Uriel Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM Journal on Computing, 35(4):964–984, 2006. doi:10.1137/S0097539704447304.
- [14] David Galvin. Three tutorial lectures on entropy and counting, 2014. arXiv:1406.7872.
- [15] Mohsen Ghaffari. An improved distributed algorithm for maximal independent set. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 270–277. SIAM, 2016. doi:10.1137/1.9781611974331.CH20.
- [16] Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrovic, and Ronitt Rubinfeld. Improved massively parallel computation algorithms for mis, matching, and vertex cover. In Calvin Newport and Idit Keidar, editors, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, Egham, United Kingdom, July 23-27, 2018, pages 129–138. ACM, 2018. doi:10.1145/3212734.3212743.
- [17] Oded Goldreich and Dana Ron. Approximating average parameters of graphs. In Josep Díaz, Klaus Jansen, José D. P. Rolim, and Uri Zwick, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 363–374, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. doi:10.1007/11830924_34.
- [18] Vladimir Grebinski and Gregory Kucherov. Optimal query bounds for reconstructing a hamiltonian cycle in complete graphs. In Fifth Israel Symposium on Theory of Computing and Systems, ISTCS 1997, Ramat-Gan, Israel, June 17-19, 1997, Proceedings, pages 166–173. IEEE Computer Society, 1997. doi:10.1109/ISTCS.1997.595169.
- [19] Sampath Kannan, Claire Mathieu, and Hang Zhou. Graph reconstruction and verification. ACM Trans. Algorithms, 14(4):40:1–40:30, 2018. doi:10.1145/3199606.
- [20] Christian Konrad, Kheeran K. Naidu, and Arun Steward. Maximum matching via maximal matching queries. In Petra Berenbrink, Patricia Bouyer, Anuj Dawar, and Mamadou Moustapha Kanté, editors, 40th International Symposium on Theoretical Aspects of Computer Science, STACS 2023, March 7-9, 2023, Hamburg, Germany, volume 254 of LIPIcs, pages 41:1–41:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.STACS.2023.41.
- [21] Claire Mathieu and Hang Zhou. A simple algorithm for graph reconstruction. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 68:1–68:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ESA.2021.68.
- [22] Andrew McGregor and Rik Sengupta. Graph reconstruction from random subgraphs. In Mikolaj Bojanczyk, Emanuela Merelli, and David P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4-8, 2022, Paris, France, volume 229 of LIPIcs, pages 96:1–96:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.96.
- [23] Lukas Michel and Alex Scott. Lower bounds for graph reconstruction with maximal independent set queries, 2024. doi:10.48550/arXiv.2404.03472.
- [24] Lev Reyzin and Nikhil Srivastava. Learning and verifying graphs using queries with a focus on edge counting. In Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, editors, Algorithmic Learning Theory, pages 285–297, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. doi:10.1007/978-3-540-75225-7_24.
- [25] Guozhen Rong, Wenjun Li, Yongjie Yang, and Jianxin Wang. Reconstruction and verification of chordal graphs with a distance oracle. Theoretical Computer Science, 859:48–56, 2021. doi:10.1016/j.tcs.2021.01.006.
- [26] Guozhen Rong, Yongjie Yang, Wenjun Li, and Jianxin Wang. A divide-and-conquer approach for reconstruction of C?5-free graphs via betweenness queries. Theoretical Computer Science, 917:1–11, 2022. doi:10.1016/j.tcs.2022.03.008.
- [27] Václav Rozhon and Mohsen Ghaffari. Polylogarithmic-time deterministic network decomposition and distributed derandomization. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 350–363. ACM, 2020. doi:10.1145/3357713.3384298.
Appendix A Independent Set Queries for Graph Reconstruction
For completeness, we will now prove that the number of Independent Set queries (or, in fact, any type of query that yields a binary answer) needed for GR on graphs of maximum degree is .
Angluin and Chen [7] observe that, since an independent set query has a binary output, at least queries are needed to distinguish any two graphs in a graph family .
We will now prove that the number of -vertex graphs with maximum degree is , which, by the previous argument, implies that Independent Set queries are needed to distinguish these graphs.
Our proof uses entropy-based arguments. We refer the reader to [14] for an excellent overview of how entropy is connected to counting problems.
Lemma 15.
The number of bipartite -vertex graphs with bipartitions and each of size and with maximum degree is at least:
Proof.
We consider the following probabilistic process: Let be a bipartite -vertex graph with obtained by inserting every potential edge into with probability . Denote by the indicator random variable of the event that does not contain a vertex of degree larger than .
We will now bound the quantity from below, which constitutes a set of bipartite graphs with maximum degree .
To this end, first, observe that:
which implies that it is enough to bound . To bound this quantity, we apply the chain rule for entropy twice on the expression :
which implies:
| (3) |
using the fact that entropy is non-negative, and that the inequality holds.
Before bounding Inequality A further, we first prove that is small and we give a bound on .
To see that is small, consider any vertex . Then, the expected degree of in is , and, by a Chernoff bound, the probability that the degree of is larger than is at most (using the assumption that ). By the union bound, the probability that there exists a vertex of degree larger than is thus at most , or, equivalently, .
Next, we bound . Since each of the potential edges is included in independently of all other edges, we obtain:
where we bounded the binary entropy function by considering only one of its the two terms.
We are now ready to further simplify Inequality A, which then yields the result:
Corollary 16.
The number of Independent Set queries needed for GR on -vertex graphs of maximum degree is .
