Edge Estimation with Independent Set Oracles

We study the task of estimating the number of edges in a graph, where the access to the graph is provided via an independent set oracle. Independent set queries draw motivation from group testing and have applications to the complexity of decision versus counting problems. We give two algorithms to estimate the number of edges in an n-vertex graph, using (i) polylog(n) bipartite independent set queries or (ii) n2/3 polylog(n) independent set queries.


INTRODUCTION
We investigate the problem of estimating the number of edges in a simple, unweighted, undirected graph G = ( n , E), where n := {1, 2, . . . , n} and m = |E|. Here, the only access to the graph dimensions, for a set P of n points, half-space counting queries (i.e., what is the size of the set |P ∩ h|, for a query half-space h) can be answered in O (n 2/3 ) time, after near-linear time preprocessing. However, emptiness queries (i.e., is the set P ∩ h empty?) can be answered in O (log n) time. Aronov and Har-Peled [6] used this to show how to answer approximate counting queries (i.e., estimating |P ∩ h|), with polylogarithmic emptiness queries.
As another geometric example, consider the task of counting edges in disk intersection graphs using GPUs [24]. For these graphs, IS queries decide if a subset of the disks has any intersection (this can be done using sweeping in O (n log n) time [11]). Using a GPU, one could quickly draw the disks and check if the sets share a common pixel. In cases like this-when IS and BIS oracles have fast implementations-algorithms exploiting independent set queries may be useful.
Decision versus counting complexity. A generalization of IS and BIS queries previously appeared in a line of work investigating the relationship between decision and counting problems [15,33,34]. Stockmeyer [33,34] showed how to estimate the number of satisfying assignments for a circuit with queries to an NP oracle. Ron and Tsur [31] observed that Stockmeyer implicitly provided an algorithm for estimating set cardinality using subset queries, where a subset query specifies a subset X ⊆ U and answers whether |X ∩ S | = 0 or not. Subset queries are significantly more general and flexible than IS and BIS queries because S corresponds to the set of edges in the graph and X is any subset of pairs of vertices. Namely, IS and BIS queries can be interpreted as restricted subset queries. In particular, the algorithms mentioned cannot be implemented directly using IS or BIS queries.
Indeed, consider subset queries in the context of estimating the number of edges in a graph. To this end, fix |S | = m (i.e., the number of edges in the graph) and |U | = n 2 (the number of possible edges). Stockmeyer provided an algorithm using only O (log log m poly(1/ε)) subset queries to estimate m within a factor of (1 + ε) with a constant success probability. Note that for a high probability bound, which is what we focus on in this article, the algorithm would naively require O (log n · log log m poly(1/ε)) queries to achieve success probability at least 1 − 1/n. Falahatgar et al. [22] gave an improved algorithm that estimates m up to a factor of (1 + ε) with probability 1 − δ using 2 log log m + O ((1/ε 2 ) log(1/δ )) subset queries. Nearly matching lower bounds are also known for subset queries [22,31,33,34]. Ron and Tsur [31] also studied a restriction of subset queries, called interval queries, where they assume that the universe U is ordered and the subsets must be intervals of elements. We view the independent set queries that we study as another natural restriction of subset queries.
Analogous to Stockmeyer's results, a recent work of Dell and Lapinskas [15] provides a framework that relates edge estimation using BIS and edge existence queries to a question in fine-grained complexity. They study the relationship between decision and counting versions of problems such as 3SUM and Orthogonal Vectors. They proved that for a bipartite graph, using O (ε −2 log 6 n) BIS queries, and ε −4 n polylog(n) edge existence queries, one can output a number m such that with probability at least 1 − 1/n 2 , we have (1 − ε)m ≤ m ≤ (1 + ε)m.
Dell and Lapinskas [15] used edge estimation to obtain approximate counting algorithms for problems in fine-grained complexity. For instance, given an algorithm for 3SUM with runtime T , they obtain an algorithm that estimates the number of YES instances of 3SUM with runtime O (T ε −2 log 6 n) + ε −4 n polylog(n). The relationship is simple. The decision version of 3SUM corresponds to checking if there is at least one edge in a certain bipartite graph. The counting version then corresponds to counting the edges in this graph. We note that in their application, the large number O (n polylog(n)) of edge existence queries does not affect the dominating term in the overall time in their reduction; the larger term in the time is a product of the time to decide 3SUM and the number of BIS queries. 1 + ε (n 2 /m) poly(log n, 1/ε ) Folklore (see Section 5.2) Degree 2 + ε √ n log n/ε [23] Degree + neighbor 1 + ε √ n poly(log n, 1/ε ) [25] Subset 1 + ε poly(log n, 1/ε ) [22,34] BIS 1 + ε npoly(log n, 1/ε ) [15] BIS 1 + ε poly(log n, 1/ε ) This Work IS 1 + ε min( √ m, n 2 /m) poly(log n, 1/ε ) This Work The bounds stated are for high probability results, with error probability at most 1/n. Constant factors are suppressed for readability.

Our Results
We describe two new algorithms. Let G = ( n , E) be a simple graph with m = |E| edges.
The bipartite independence oracle. We present an algorithm that uses BIS queries and computes an estimate m for the number of edges in G such that (1 − ε)m ≤ m ≤ (1 + ε)m. The algorithm performs O (ε −4 log 14 n) BIS queries and succeeds with high probability (see Theorem 4.9 for a precise statement). Ignoring the cost of the queries, the running time is near linear (we mostly ignore running times in this article because query complexity is our main resource). Since polylog(n) BIS queries can simulate a degree query (see Section 4.4), one can obtain a (2 + ε)-approximation of m by using Feige's algorithm [23], which uses degree queries. This gives an algorithm that uses O ( √ n polylog(n)/poly(ε)) BIS queries. Our new algorithm provides significantly better guarantees in terms of both the approximation and number of BIS queries.
The result is somewhat more general than stated previously. One can use the algorithm to estimate the number of edges in any induced subgraph of the original graph. Similarly, one can estimate the number of edges in the graph between any two disjoint subsets of vertices U , V ⊆ n . In other words, the algorithm can estimate the size of Compared to the result of Dell and Lapinskas [15], our algorithm uses exponentially fewer queries since we do not spend n polylog(n) edge existence queries. Our improvement does not seem to imply anything for their applications in fine-grained complexity. We leave open the question of finding problems where a more efficient BIS algorithm would lead to new decision versus counting complexity results.
The ordinary independence oracle. We also present a second algorithm, using only IS queries to compute a (1 + ε)-approximation. It performs O (ε −4 log 5 n + min(n 2 /m, √ m) · ε −2 log 2 n)IS queries (see Theorem 5.8). In particular, the number of IS queries is bounded by O (ε −4 log 5 n + ε −2 n 2/3 log 2 n). The first term in the minimum (i.e., ≈ n 2 /m) comes from a folklore algorithm for estimating set cardinality using membership queries (see Section 2.3). The second term in the minimum (i.e., ≈ √ m) is the number of queries used by our new algorithm. We observe that BIS queries are surprisingly more effective for estimating the number of edges than IS queries. Shedding light on this dichotomy is one of the main contributions of this work.
Comparison with other queries. Table 1 summarizes the results for estimating the number of edges in a graph in the context of various query types. Given some of the results in Table 1 on edge estimation using other types of queries, a natural question is how well BIS and IS queries can simulate such queries. In Section 4.4, we show that O (ε −2 log n) BIS queries are sufficient to simulate degree queries. However, we do not know how to simulate a neighbor query (to find a specific neighbor) with few BIS queries, but a random neighbor of a vertex can be found with O (log n) BIS queries (see the work of Ben-Eliezer et al. [8]). For IS queries, it turns out that estimating the degree of a vertex v up to a constant factor requires at least Ω(n/deg(v)) IS queries (see Section 5.3).
Notation. Throughout, log and ln denote the logarithm taken in base 2 and e, respectively. For integers, u, k, let k = {1, . . . , k } and u : k = {u, . . . , k }. The notation x = polylog(n) means x = O (log c n) for some constant c > 0. A collection of disjoint sets U 1 , . . . ,U k such that i U i = U , is a partition of the set U , into k parts (a part U i might be an empty set). In particular, a (uniformly) random partition of U into k parts is chosen by coloring each element of U with a random number in k and identifying U i with the elements colored with i.
Throughout, we use G = ( n , E) to the denote the input graph. The number of edges in G is denoted by m = |E|. For a set U ⊆ n , let E (U ) = {uv ∈ E | u, v ∈ U } be the set of edges between vertices of U in G. For two disjoint sets U , V ⊆ n , let E(U , V ) denote the set of edges between U and V : Let m(U ) and m(U , V ) denote the number of edges in E (U ) and E(U , V ), respectively. We also abuse notation and let m(H ) be the number of edges in a subgraph H (e.g., m(G) = m).
High probability conventions. Through the article, the randomized algorithms presented would succeed with high probability-that is, with probability ≥ 1 − 1/n Ω(1) . Formally, this means the probability of success is ≥ 1 − 1/n c , for some arbitrary constant c > 0. For all of these algorithms, the value of c can be increased to any arbitrary value (i.e., improving the probability of success of the algorithm) by increasing the asymptotic running time of the algorithm by a constant factor that depends only on c. For the sake of simplicity of exposition, we do not explicitly keep track of these constants (which are relatively well behaved).

Overview of the Algorithms
1.3.1 The BIS Algorithm. Our discussion of the BIS algorithm follows Figure 1, which depicts the main components of one level of our recursive algorithm. Our algorithms rely on several building blocks, as described next.
Exactly count edges. One can exactly count the edges between two subsets of vertices, with a number of queries that scales nearly linearly in the number of such edges. Specifically, a simple deterministic divide and conquer algorithm to compute m(U , V ) using O (m(U , V ) log n)BIS queries is described later in Lemma 4.1.
Sparsify. The idea is now to sparsify the graph in such a way that the number of remaining edges is a good estimate for the original number of edges (after scaling). Consider sparsifying the graph by coloring the vertices of graph and only looking at the edges going between certain pairs of color classes (in our algorithm, these pairs are a matching of the color classes). We prove that it suffices to only count the edges between these color classes, and we can ignore the edges with both endpoints inside a single color class.
For any k satisfying 1 ≤ k ≤ n/2 , let U 1 , . . . ,U k , V 1 , . . . ,V k be a uniformly random partition of n . Then, we have where c is some constant. For the proof of this inequality, see Section 3. Specifically, if we set G i to be the induced bipartite subgraph on U i and V i , then 2k i m(G i ) is a good estimate for m(G). In the first step, we color the vertices and sparsify the graph by only looking at the edges between vertices of the same color. In the second step, we coarsely estimate the number of edges in each colored subgraph. Next, we group these subgraphs based on their coarse estimates, and we subsample from the groups with a relatively large number of edges. In the final step, we exactly count the edges in the sparse subgraphs, and we recurse on the dense subgraphs. Now the graph is bipartite. The preceding sparsification method implies that we can assume without loss of generality that the graph is bipartite. Indeed, invoking the lemma with k = 1, we see that estimating the number of edges between the two color classes is equivalent to estimating the total number of edges, up to a factor of 2. For the rest of the discussion, we will consider colorings that respect the bipartition.
Coarse estimator. We give an algorithm that coarsely estimates the number of edges in a (bipartite) subgraph, up a O (log 2 n) factor, using only O (log 3 n) BIS queries.
The subproblems. After coloring the graph, we have reduced the problem to estimating the total number of edges in a collection of (disjoint) bipartite subgraphs. However, certain subgraphs may still have a large number of edges, and it would be too expensive to directly use the exact counting algorithm on them.
Reducing the number of subgraphs in a collection via importance sampling. Using the coarse estimates, we can form O (log n) groups of bipartite subgraphs, where each group contains subgraphs with a comparable number of edges. For the groups with only a polylogarithmic number of edges, we can exactly count edges using polylog(n) BIS queries via the exact count algorithm mentioned earlier. For the remaining groups, we subsample a polylogarithmic number of subgraphs from each group. This new estimate is a good approximation to the original quantity, with high probability. This corresponds to the technique of importance sampling that is used for variance reduction when estimating a sum of random variables that have comparable magnitudes.
Sparsify and reduce. We use the sparsification algorithm on each graph in our collection. This increases the number of subgraphs while reducing (by roughly a factor of k) the total number of edges in these graphs. The number of edges in the new collection is a reliable estimate for the number of edges in the old collection. We will choose k to be a constant so that every sparsification round reduces the number of edges by a constant factor.
If the number of graphs in the collection becomes too large, we reduce it in one of two ways. For the subgraphs with relatively few edges, we exactly count the number of edges using only polylog(n) queries. For the dense subgraphs, we can apply the preceding importance sampling technique and retain only polylog(n) subgraphs. Every basic operation in this scheme requires polylog(n) BIS queries, and the number of subgraphs is polylog(n). Therefore, a round can be implemented using polylog(n) BIS queries. Now, since every round reduces the number of edges by a constant factor, the algorithm terminates after O (log n) rounds, resulting in the desired estimate for m using only polylog(n) queries in total. Figure 1 depicts the main components of one round.
We have glossed over some details regarding the reweighting of intermediate estimates, as both the sparsfication and importance sampling steps involve subsampling and rescaling. To handle this, the algorithm will maintain a weight value for each subgraph in the collection (starting with unit weight). Then, these weights will be updated throughout the execution, and they will be used during coarse estimation. For the final estimate, the algorithm will output a weighted sum of the estimates for the remaining subgraphs in addition to the weighted version of the exactly counted subgraphs. By using these weights to properly rescale estimates and counts, the algorithm will achieve a good estimate for m with high probability.

The IS Algorithm.
We move on to describe our second algorithm, based on IS queries. As with the BIS algorithm, the main building block for the IS algorithm is an efficient way to exactly count edges using IS queries. The exact counting algorithm works by first breaking the vertices of the graph into independent sets in a greedy fashion and then grouping these independent sets into larger independent sets using (yet again) a greedy algorithm. The resulting partition of the graph into independent sets has the property that every two sets have an edge between them, and this partition can be computed using a number of queries that is roughly m. This is beneficial, because when working on the induced subgraph on two independent sets, the IS queries can be interpreted as BIS queries. As such, edges between parts of the partition can be counted using the exact counting algorithm, modified to use IS queries. The end result is that for a given set U ⊆ n , one can compute m(U ), the number of edges with both endpoints in U , using O (m(U ) log n) IS queries. This algorithm is described in Section 5.1. Now, we can sparsify the graph to reduce the overall number of IS queries. In contrast to the BIS queries, we do not know how to design a coarse estimator using only IS queries (see Section 5.3). This prohibits us from designing a similar algorithm. Instead, we estimate the number of edges in one shot by coloring the graph with a large number of colors and estimating the number of edges going between a matching of the color classes. This is somewhat counterintuitive. An initial sparsification attempt might be to count only the edges going between a single pair of colors. If the total number of colors is 2k, then we expect to see m/ 2k 2 edges between this pair. Therefore, we could set k to be large and invoke Lemma 5.3. Scaling by a factor of 2k 2 , we would hope to get an unbiased estimator for m.
Unfortunately, a star graph demonstrates that this approach does not work, due to the large variance of this estimator. If we randomly color the vertices of the star graph with 2k colors, then out of the 2k 2 pairs of color classes, only 2k − 1 pairs have any edge going between color classes. Thus, if we only choose one pair of color classes, then with high probability one of the following two cases occurs: either (i) there is no edge crossing the color pair or (ii) the number of edges crossing the pair is ≈ m/2k. In both cases, our estimate after scaling by a factor of 2k 2 will be far from the truth.
At the other extreme, most edges will be present if we look at the edges crossing all pairs of color classes. Indeed, the only edges we miss have both endpoints in a color class, and this accounts for only a 1/2k fraction of the total number of edges. Thus, this does not achieve any substantial sparsification.
By using a matching of the color classes, we simultaneously get a reliable estimate of the number of edges and a sufficiently sparsified graph (see Lemma 3.2). Let U 1 , . . . ,U k , V 1 , . . . ,V k be a random partition of the vertices into 2k color classes. This implies that with high probability, the estimator Hence, as long as we choose k to be less than ε √ m/polylog(n), we approximate m up to a factor of (1 + O (ε)). We use geometric search to find such a k efficiently.
To get a bound on the number of IS queries, we claim that we can compute k i=1 m(U i , V i ) using Lemma 5.3, with a total of (k + m k ) polylog(n) IS queries. The first term arises since we have to make at least one query for each of the k color pairs (even if there are no edges between them). For the second term, we pay for both (i) the edges between the color classes and (ii) the total number of edges with both endpoints within a color class (since the number of IS queries in Lemma 5.3 scales with m(U ∪ V )). By the sparsification lemma, we know that (i) is bounded by O (m/k ) with high probability and we can prove an analogous statement for (ii). Hence, plugging in a value of k ≈ ε √ m/polylog(n), the total number of IS queries is bounded by √ m polylog(n)/ε.

Subsequent Work After Initial Publication
After the initial publication of our results [7], there has been some follow-up work [9,10,13,16]. Answering one of the open questions of our earlier work [7], Chen et al. [13] provide nearlymatching upper and lower bounds on the number of IS queries for edge estimation. More precisely, they show that O (min(n/ √ m, √ m ) · poly(log(n), 1/ε))IS queries are sufficient (the term n 2 /m is the new result). They also prove that Ω min(n/ √ m, √ m )/polylog(n)) IS queries are necessary for a certain family of graphs.
Dell et al. [16] provide new connections between decision and approximate counting results for problems such as k-SUM, k-Orthogonal-Vectors, and k-Clique by relating the complexity to edge estimation using certain queries. In particular, their work extends the previous work of Dell and Lapinskas [15] to the case of k-hypergraphs, and they consider a generalization of BIS queries to k-partite set queries. As one of their technical results, they improve the dependence on ε in Theorem 4.9 from ε −4 down to ε −2 .
Bhattacharya et al. [9,10] also consider the generalization of BIS queries to tripartite set queries, where they use such queries to estimate the number of triangles in a graph.

Outline
The rest of the article is organized as follows. We start in Section 2 by reviewing some necessary tools-concentration inequalities, importance sampling, and set size estimation via membership queries. In Section 3, we prove our sparsification result (Lemma 3.2).
In Section 4, we describe the algorithm for edge estimation for the BIS case. Section 4.1 describes the exact counting algorithm. In Section 4.2, we present the algorithm that uses BIS queries to coarsely estimate the number of edges between two subsets of vertices (Lemma 4.8). We combine these building blocks to construct our edge estimation algorithm using BIS queries in Section 4.3.
The case of IS queries is tackled in Section 5. In Section 5.1, we formally present the algorithms to exactly count edges between two subsets of vertices (Lemma 5.3). In Section 5.2, we present our algorithm using IS queries. In Section 5.3, we provide some discussion of why the IS case seems to be harder than the BIS case. We conclude in Section 6 and discuss open questions.

PRELIMINARIES
Here we present some standard tools that we need later on.

Concentration Bounds
For proofs of the following concentration bounds, see the book by Dubhashi and Panconesi [18].
, let and u be real numbers such that ≤ μ ≤ u. Then, we have that We need a version of Azuma's inequality that takes into account a rare bad event-the following is a restatement of Theorem 8.3 from Chung and Lu [14] in a simplified form (which sufficient for our purposes).

Lemma 2.3 ([14]
). Let f be any function of r independent random variables Y 1 , . . . , Y r , and . . , c r are some nonnegative numbers. Let B be the event that a bad sequence happened, and let

Importance Sampling
Importance sampling is a technique for estimating a sum of terms. Assume that for each term in the summation, we can cheaply and quickly get an initial, coarse estimate of its value. Furthermore, assume that better estimates are possible but expensive. Importance sampling shows how to sample terms in the summation, then acquire a better estimate only for the sampled terms, to get a good estimate for the full summation. In particular, the number of samples is bounded independently of the original number of terms, depending instead on the coarseness of the initial estimates, the probability of success, and the quality of the final output estimate. Lemma 2.4 (Importance Sampling). Let U = {u 1 , . . . ,u r } be a set of numbers, all contained in the interval [α/b, αb], for α > 0 and b ≥ 1. Let γ , ε > 0 be parameters. Consider the sum Γ = r i=1 u i . For an arbitrary t ≥ b 4 2ε 2 (1 + ln 1 γ ), and i = 1, . . . , t, let X i be a random sample chosen uniformly (and independently) from the set U (i.e., let j i be uniformly and randomly picked from r , and let The preceding lemma enables us to reduce a summation with many numbers into a much shorter summation (while introducing some error, naturally). The list/summation reduction algorithm we need is described next.
To this end, we have parameters ξ > 0, γ , b, and M such that: Then, one can compute a new (hopefully shorter) sequence of triples (H 1 , w 1 , e 1 ), . . . , (H t , w t , e t ) (the new sequence is a subsequence of the original sequence with reweighting). The new sequence complies with the preceding conditions, and furthermore, the estimate Proof. We break the interval [1, M] into log M intervals in the natural way, where the j th in- Let For all j ∈ h , let Γ j = (H,w,e ) ∈U j w · w(H ) be the total weight of structures in the j th group. By Lemma 2.4, we have, with probability Summing these inequalities over all j ∈ h , implies that Y is the desired approximation with probability ≥ 1 − γ .
Specifically, the output sequence is constructed as follows. For all j ∈ h , and for every triple (H , w, e) ∈ R j , we add (H , w · W j , e) to the output sequence. Clearly, the output sequence has

Estimating Subset Size via Membership Oracle Queries
We present here a standard tool for estimating the size of a subset via membership oracle queries. This is well known, but we provide the details for the sake of completeness. Lemma 2.7. Consider two (finite) sets B ⊆ U , where n = |U |. Let ε ∈ (0, 1) and γ ∈ (0, 1/2) be parameters. Let д > 0 be a user-provided guess for the size of |B|. Consider a random sample R, taken with replacement from U , of size r = c 5 ε −2 (n/д) log γ −1 , where c 5 is sufficiently large. Next, consider the estimate Y = (n/r )|R ∩ B| to |B|. Then, we have the following: Both of the preceding statements hold with probability ≥ 1 − γ .
, and this is ≤ γ for c 5 a sufficiently large constant.
(B) We have two cases to consider. For the first case, suppose that |B| < д/4. In this case, if X = r i=1 X i is the random variable as described part (A), then each X i is an indicator variable with probability p = |B|/n < д/(4n) and by Chernoff's inequality (Lemma 2.2(C)) and again this is ≤ γ for c 5 which is ≤ γ /2 for c 5 ≥ 24 ln 2, as γ ≤ 1/2. Adding these two failure probabilities together gives a bound of at most γ as required.
Lemma 2.8. Consider two sets B ⊆ U, where n = |U |. Let ξ , γ ∈ (0, 1) be parameters such that γ < 1/ log n. Assume that one is given an access to a membership oracle that, given an element x ∈ U , returns whether or not x ∈ B. Then, one can compute an estimate s such that Proof. Let д i = n/2 i+2 . For i = 1, . . . , log n, use the algorithm of Lemma 2.7 with ε = 0.5, with the probability of failure being γ /(8 log n), and let Y i be the returned estimate. The algorithm stops this loop as soon as Y i ≥ 4д i . Let I be the value of i when the loop stopped. The algorithm now calls Lemma 2.7 again with д I and ε = ξ , and returns the value of Y , as the desired estimate.
Overall, for T = 1 + log n , the preceding makes T calls to the subroutine of Lemma 2.7, and the probability that any of them to fail is Tγ /(8 log n) < γ . Assume that all invocations of Lemma 2.7 were successful. In particular, Lemma 2.7 guarantees that if Y > 4д I ≥ д I /2, then the estimate returned is (1 ± ε)-approximation to the desired quantity.

Estimating Subset Size via Emptiness Oracle Queries.
Consider the variant where we are given a set X ⊆ U . Given a query set Q ⊆ U , we have an emptiness oracle that tells us whether Q ∩ X is empty. Using an emptiness oracle, one can get a (1 ± ε)-approximate the size of X using relatively few queries. The following result is implied by the work of Aronov and Har-Peled [6, Theorem 5.6] and Falahatgar et al. [22]-the latter result has better bounds if the failure probability is not required to be polynomially small. Lemma 2.9 ( [6,22]). Consider a set X ⊆ U , where n = |U |. Let ε ∈ (0, 1) be a parameter. Assume that one is given an access to an emptiness oracle that, given a query set Q ⊆ U , returns whether or not X ∩ Q ∅. Then, one can compute an estimate s such that (1 − ε)|X | ≤ s ≤ (1 + ε)|X |, using O (ε −2 log n) emptiness queries. The returned estimate is correct with probability ≥ 1 − 1/n Ω(1) .
We sketch the basic idea of the algorithm used in the preceding lemma. For a guess д of the size of X , consider a random sample Q where every element of U is picked with probability 1/д. The probability that Q avoids X is α (д) = (1 − 1/д) |X | . The function α (д) is (i) monotonically increasing, (ii) close to zero when д |X |, (iii) ≈ 1/e for д = |X |, and (iv) close to 1 if д |X |. One can estimate the value α (д) by repeated random sampling and checking if the random sample intersects X using emptiness queries. Given such an estimate, one can then perform an approximate binary search for the value of д such that α (д) = 1/e, which corresponds to д = |X |. See the work of Arnov and Har-Peled [6] and Falahatgar et al. [22] for further details.

EDGE SPARSIFICATION BY RANDOM COLORING
In this section, we present and prove that coloring vertices and counting only edges between specific color classes provides a reliable estimate for the number of edges in the graph. This is distinct from standard graph sparsification algorithms, which usually sparsify the edges of the graph directly (usually by sampling edges).
We need the following technical lemma.
Lemma 3.1. Let C be a set of r elements, colored randomly by k colors-specifically, for every element x ∈ C, one chooses randomly (independently and uniformly) a color for it from the set k . For i ∈ k , let n i be the number of elements of C with color i. Let n be a positive integer and c > 1 be an arbitrary constant. Then, Proof. (A) For ∈ r , let X be the indicator variable that is 1 with probability 1/k and 0 otherwise. For X = r =1 X , notice that n i is distributed identically to X and that E[X ] = E[n i ] = r /k. Using Chernoff's inequality (Lemma 2.2(A)), we have

Lemma 3.2. (A)
There exists an absolute constant ς such that the following holds. For every n, let G = ( n , E) be a graph with m edges. For any 1 ≤ k ≤ n/2 , let U 1 , . . . ,U 2k be a uniformly random partition of n . Then, (B) There exists an absolute constant ς such that the following holds. Similarly, for every n, disjoint sets U , V ⊆ n , and k such that 2 ≤ k ≤ max{|U |, |V |}, let U 1 , . . . ,U k , V 1 , . . . ,V k be uniformly random partitions of U and V , respectively. Then, Proof. (A) Consider the random process that colors vertex t, at time t ∈ n , with a uniformly random color Y t ∈ 2k . The colors correspond to the partition of n into classes U 1 , . . . ,U 2k . Define  The probability of a specific edge uv to be counted by f is 1/(2k ). Indeed, fix the color of u, and observe that there is only one choice of the color of v, such that uv would be counted. As such, Consider the Doob martingale X 0 , We are interested in bounding the quantity |X t − X t −1 |. To this end, fix the value of Y t −1 , and let We have that Let N (t ) be the set of neighbors of t in the graph and deg(t ) = |N (t )| be the degree of t. Let N <t = N (t ) ∩ t − 1 and N >t = N (t ) ∩ t + 1 : n be the before/after set of neighbors of t, respectively. Let C i <t (respectively, C i >t ) be the number of neighbors of t in N <t (respectively, N >t ) colored with color i. For a color i ∈ 2k , let π (i) = 1 + ((k + i − 1) mod 2k ) be its matching color.
Fix two distinct colors i, j ∈ 2k , and let To see why the preceding is true, observe that any edge involving two vertices in t − 1 has the same contribution to д(i) and д(j). Similarly, an edge with a vertex in t − 1 , and a vertex in t + 1 : n , has the same contribution to both terms. The same argument holds for an edge involving vertices with indices strictly larger than t. As such, only the edges adjacent to t have a different contribution, which is as stated. Rearranging, we have by Lemma 3.1 with C = N (t ) and r = deg(t ), with probability at least 1 − β for β = 4/n c for any constant c > 1, that . Furthermore, we have that k · Γ is a (1 ± ξ )approximation to m(G), where ξ = (ςk √ m log n)/m, with high probability. For our purposes, we need Setting k = 4, the preceding implies that one can apply the refinement algorithm of Lemma 3.2 if m = Ω(ε −2 log 4 n). With high probability, the number of edges in the new k subgraphs (i.e., Γ), scaled by k, is a good estimate (i.e., within a 1 ± ε/(8 log n) factor) for the number of edges in the original graph, and furthermore, the number of edges in the new subgraphs is small (formally, E[Γ] ≤ m/4, and with high probability Γ ≤ m/2).

EDGE ESTIMATION USING BIS QUERIES
Here we show how to get exact and approximate count for the number of edges in a graph using BIS queries (see also [5,30]). Proof. We use a recursive divide-and-conquer approach, which intuitively builds a quadtree over the pair (U , V ). Specifically, consider the incidence matrix M of size |U | × |V |, where a column corresponds to an element of V and a row to an element of U . An entry in the matrix is equal to 1 if there is an edge between the corresponding nodes in the original graph, and it is zero otherwise. The task at hand as such is to count the number of ones in the matrix. A BIS query then corresponds to deciding if an induced submatrix is all zero. We now conceptually build a tree (i.e., a quadtree), by partitioning the matrix into four submatrices of the same dimensions (in the natural way), and recursively build a quadtree for each submatrix. Intuitively, the algorithm counts the 1s in the matrix, by tracking each of the 1s to their corresponding leaf node in the quadtree.

Exactly Counting Edges Using BIS Queries
To this end, the algorithm first issues the query BIS (U , V ). If the return value is false, then there are no edges between U and V , and the algorithm sets m(U , V ) to zero, and returns. If |U | = |V | = 1, then this also determines if m(U , V ) is 0 or 1 in this case, and the algorithm returns. The remaining case is that m(U , V ) 0, and the algorithm recurses on the four children of (U , V ), which will correspond to the pairs (U 1 , V 1 ), (U 1 , V 2 ), (U 2 , V 1 ), and (U 2 , V 2 ), where U 1 , U 2 and V 1 , V 2 are equipartitions of U and V , respectively. We are using here the identity If m(U , V ) = 0 holds, then the number of queries is exactly equal to 1, and the lemma is true in this case. For the rest of the proof, we assume that m(U , V ) ≥ 1. To bound the number of queries, imagine building the whole quadtree for the adjacency matrix of U × V with entries for E (U , V ). Let X be the set of 1 entries in this matrix, and let k = |X | (i.e., X corresponds to set of leaves that are labeled 1 in the quadtree). The height of the quadtree is h = O (max{log |U |, log |V |}). Let X 1 be the set of nodes in the quadtree that are either in X or are ancestors of nodes of X . It is not hard to verify that |X 1 | = O (k + k log(|U ||V |)) = O (k log n). Finally, let X 2 be the set of nodes in the quadtree that are either in X 1 or their parent is in X 1 . Clearly, the algorithm visits only the nodes of X 2 in the recursion, thus implying the desired bound.
As for the budgeted version, run the algorithm until it has accumulated T = O (t/ log n) edges in the working set, where T > t. If this never happens, then the number of edges of the graph is at mostT , as desired, and the preceding analysis applies. Otherwise, the algorithm stops, and applying the same argument as before, we get that the number of BIS queries is bounded by O (T log n) = O (t ).
Remark. The number of BIS queries made by the algorithm of Lemma 4.1 is at least max{m(U , V ), 1}, since every edge with one endpoint in U and the other in V is identified (on its own, explicitly) by such a query.
We note that we can use the above algorithm to exactly identify the edges of an arbitrary graph using BIS queries with a cost of O (log n) overhead per edge (see also [5,30]). However, we will not need to do so in the sequel.   We remain with the task of computing Z . Let U 0 = n . For i = 1, . . . ,T = log 2 n , let A i be the elements of U i−1 whose i th bit in their binary representation is 1.
Observe that every edge e in G has an index i such that its two vertices differ in the i th bit. Note that either one of the endpoints of e was already added to Z before the i th iteration, or it would be discovered and its endpoints added to Z in the i th iteration. As such, the set Z is computed correctly. Since E 1 , . . . , E T are disjoint sets, it follows that computing For the budgeted version, we run the algorithm until τ = Ω(t ) BIS queries have been performed. If this does not happen, then the graph has at most τ edges, and they were reported by the algorithm. Otherwise, we know that the graph must have at least τ / log n edges, as desired.

The Coarse Estimator Algorithm
Let G = G ( n , E) be a graph and let U , V ⊆ n be disjoint subsets of the vertices. The task at hand is to estimate m(U , V ), using polylog BIS queries.
For a subset S ⊆ n , define N (S ) to be the union of the neighbors of all vertices in S. For a vertex v, let deg S (v) denote the number of neighbors of v that lie in S. For i ∈ log n , define the set of vertices in U with degree between 2 i and 2 i+1 as , the first inequality is stating that there is a term as large as the average. As for the second inequality, observe that for every i, Suppose that we have an estimate e for the number of edges between U and V in the graph. Consider the test CheckEstimate, depicted in Algorithm 4.1, for checking if the estimate e is correct up to polylogarithmic factors using a logarithmic number of BIS queries. .

By a union bound over the loop variable values, the probability that the test accepts is at most 1/4. (B)
It is enough to show that the probability is at least 1/2 when the loop variable attains the value α given by Claim 4.5. In this case, we have that |U α | ≥ m(U ,V ) 2 α +1 (log n+1) , and thus since n ≥ 16. Furthermore, since deg V (u) ≥ 2 α for all u ∈ U α , it follows that when U ∩ U α ∅, then |N (U ∩ U α )| ≥ 2 α . Thus, we can bound From the preceding, we get Armed with the preceding test, we can easily estimate the number of edges up to a O (log n) factor by doing a search, where we start with e = n 2 and halve the number of edges each iteration. The algorithm is depicted in Algorithm 4.2. Proof. For any fixed value of the loop variable j such that 2 j ≥ 4m(U , V )(log n + 1), the expected number of accepts is at most t/4 using Claim 4.6(A), where t = 128 log n. The probability that we see at least 3t/8 = t/4 + t/8 accepts is bounded by exp(−2(t/8) 2 /t ) = exp(−t/32) ≤ n −4 by Chernoff's inequality (Lemma 2.2(A)). Taking the union over all values of j, the probability that the algorithm returns 2 j , when 2 j ≥ 4m(U , V )(log n + 1), is at most 2n −4 log n.
However, when 2 j ≤ m(U , V )/(4 log n), the expected number of accepts is at least t/2, by Claim 4.6(B), and so the probability that we see at least 3t/8 = t/2 − t/8 accepts is at least 1 − exp(−2t/8 2 ) ≥ 1 − n −4 by Chernoff's inequality (Lemma 2.2(A)). Hence, conditioned on the event that the algorithm has not already returned a bigger value of j, the probability that we accept for the unique j that satisfies m Overall, by a union bound, the probability that the estimator outputs an estimate e that does not satisfy (8 log n) −1 ≤ e/m(U , V ) ≤ 8 log n is at most 4n −4 log n. The number of BIS queries is bounded by O (log 3 n), since for each value of j there are t = 128 log n trials of CheckEstimate, each of which makes log n + 1 queries to the BIS oracle.
Summarizing the preceding, we get the following result Lemma 4.8. For n ≥ 16, and arbitrary U , V ⊆ n that are disjoint, the randomized algorithm CoarseEstimator(U , V ) makes at most c ce log 3 n BIS queries (for a constant c ce ) and outputs e ≤ n 2 such that with probability at least 1 − 4n −4 log n, we have (8 log n) −1 ≤ e/m(U , V ) ≤ 8 log n.

The Overall BIS Approximation Algorithm
Given a graph G = ( n , E), next we describe an algorithm that makes polylog(n)/ε 4 BIS queries to estimate the number of edges in the graph within a factor of (1 ± ε).
The algorithm for estimating the number of edges in the graph is going to maintain a datastructure D containing the following: (A) An accumulator φ-this is a counter that maintains an estimate of the number of edges already handled.
The estimate based on D of the number of edges in the original graph G = ( n , E) is The number of active edges in D is m active Next, the algorithm uses the summation reduction algorithm of Lemma 2.5 applied to the list of triples in D, with ξ = ε/(8 log n). This reduces the number of triples in D to be at most L len while introducing a multiplicative error of (1 ± ξ ).

Analysis. Number of iterations.
Initially, the number of active edges is at most m. Every time Refine is executed, this number reduces by a factor of 2 with high probability using Lemma 3.2(B) (in expectation, the reduction is by a factor of 4). As such, after log m ≤ log( n 2 ) ≤ 2 log n iterations there are no active edges, and then the algorithm terminates.
Number of BIS queries. Clearly, because Reduce is used on D in each iteration, the algorithm maintains the invariant that the number of triples in D is at most O (L len ), where L len = O (ε −2 log 8 n) as specified by Remark 2.6.
The procedure Cleanup applies the algorithm of Lemma 4.1 to decide whether a triple in the list has at least 2L small edges associated with it, or fewer edges, where L small = Θ(ε −2 log 4 n) (see Equation Inside each iteration, Cleanup introduces no error. By the choice of parameters, Refine introduces a multiplicative error that is at most 1 ± ξ (see Remark 3.3). Similarly, Reduce introduces a multiplicative error bounded by 1 ± ξ (see Remark 2.6). As such, the multiplicative approximation of the algorithms lies in the interval since (1 − ε/(8 log n)) 1+2 log n ≥ 1 − ε and (1 + ε/(8 log n)) 1+2 log n ≤ 1 + ε as easy calculations show.
Probability of success. Throughout this analysis, c will be a constant that can be chosen to be arbitrarily large. The algorithm may fail due to the following reasons: (i) the random 2-coloring in step (B) gives an estimate that is far from its expectation (this probability is at most 1/n c using Lemma 3.2(A)), (ii) the Refine step fails (the probability for the failure of each iteration is at most 1/n c using Lemma 3.2(B)), (iii) the coarse estimate in Reduce step fails (the probability for the failure of each iteration is at most 1/n c using Claim 4.7), and last (iv) the summation reduction in the Reduce step fails (the probability for the failure of each iteration is at most 1/n c using Lemma 2.5). Overall, every step performed by the algorithm had probability at most 1/n c to fail. The algorithm performs O (polylog(n)) steps with high probability, which implies that the algorithm succeeds with probability at least 1 − 1/n O (1) .
Proof. Let N (v) = {i | vi ∈ E} be the set of neighbors of v, and let E v = {vi | vi ∈ E} be the corresponding set of edges. We have deg is equivalent to deciding if any of the edges adjacent to v is in E Q , and this is answered by the BIS query for ({v}, Q ). Namely, the BIS oracle can function as an emptiness oracle for N (v) ⊆ n . Now, using the algorithm of Lemma 2.9, we can (1 ± ε)-approximation |N (v)| using O (ε −2 log n) queries, as claimed.

EDGE ESTIMATION USING IS QUERIES
This section describes and analyzes our IS query algorithm (Theorem 5.8). At the end, we also discuss limitations of IS queries, suggesting that IS queries may indeed be weaker than BIS queries.

Exactly Counting Edges Using IS Queries
We start with an exact edge counting algorithm for IS querie (see also [5,30]). At a high-level, we use Lemma 4.1 after efficiently computing a suitable decomposition of our graph. Proof. Since U and V are disjoint and independent, we have that m(U ∪ V ) = m(U , V ). Furthermore, for any U ⊆ U and V ⊆ V , the query BIS(U , V ) is equivalent to the query IS(U ∪ V ). As such, we can use the algorithm of Lemma 4.1, using the IS queries as a replacement for the BIS queries, yielding the result. The next step is to break the set of interest U into independent sets. Proof. Order the elements of U = {u 1 , . . . ,u k } arbitrarily. The idea is to break U into independent sets, where each independent set is an interval I j = {u i j , u i j +1 , . . . ,u i j+1 −1 }. This can be done in a greedy fashion from left to right, discovering the index where an interval stops being an independent set. Assume inductively that one has computed the first j such independent intervals I 1 , . . . , I j , and also assume that I j ∪ {u i j+1 } is not an independent set. Next, using binary search on the range {i j+1 + 1, . . . , n}, find the maximal β such that {u i j+1 , . . . ,u β } is independent.
For any j, we have m(I j , I j+1 ) ≥ 1, which implies that the number of computed intervals τ satisfies τ ≤ m(U ) + 1. As such, this stage uses O ((1 + m(U )) log n)IS queries. This results in a decomposition of U into τ independent sets I 1 , . . . , I τ .
In the second stage, starting with the computed collection of independent sets, the algorithm greedily tries to merge sets. In each step, the algorithm takes two independent sets B,W in the current collection (for which it might be possible that their merged set is independent), and the algorithm uses an IS query to check whether B ∪ W is an independent set. If it is, then the algorithm merges the two sets into one independent set (replacing B,W by the set B ∪ W in the current collection of sets). Otherwise, the algorithm marks the two sets B and W as being incompatible with each other. Note that if B,W are incompatible, then for any B ⊇ B and W ⊇ W , the sets B and W are also incompatible. Namely, incompatibility is preserved under merger of independent sets, and the algorithm can keep track of the incompatible pairs under merger (importantly, a merger cannot decrease the number of incompatible pairs). The algorithm stops when all current sets are pairwise incompatible.
Each merge of two independent sets can be charged to the number of independent sets decreasing by one. Each pair of sets that is discovered to be incompatible can be charged to the edge witnessing that the merged set is not independent. Since every edge is only charged once by this process, it follows that the total number of IS queries performed by the second stage of the algorithm is at most τ + m(U ) ≤ 2m(U ) + 1.
The resulting collection of independent sets has the desired properties, completing the proof. Proof. Using the algorithm of Lemma 5.2, compute the decomposition of U into independent sets V 1 , . . . ,V t . By construction, for any i < j, we have that m(V i , V j ) ≥ 1, as some vertex of V i is connected to some vertex in V j . As such, going over all 1 ≤ i < j ≤ t, compute the set of edges E (V i , V j ) using the algorithm of Lemma 5.1. This requires O (m(V i , V j ) log n) IS queries. As such, the total number of IS queries used by this algorithm is O (m(U ) log n The budgeted version follows by running the algorithm until c log n IS queries have been performed, for c a sufficiently large constant. If this happens, then the number of edges in the graph is larger than t (as otherwise, the preceding implies that the algorithm would have already terminated), and the algorithm stops and outputs this fact.

Algorithms for Edge Estimation Using IS Queries
Our IS algorithm has two main subroutines. We first describe and analyze these, then we combine them for the overall algorithm, which is presented in Theorem 5.8.

Growing Search.
The following is an immediate consequence of Lemma 5.3. Proof. We color the vertices in U randomly using k = tε/(ς log n) colors for a constant ς to be specified shortly, and let U 1 , . . . ,U k be the resulting partition. By Lemma 3.2, we have for the and this holds with probability ≥ 1 − n −c 3 , where c 3 is an arbitrarily large constant, and ς is a constant that depends only on c 3 . For this to be a (1 ± ε)-approximation, we need that This in turn is equivalent to which holds because of the assumption that m(U ) ≥ max{L base , t 2 } in the statement.
To proceed, the algorithm starts computing the terms in the summation defining Γ, using the algorithm of Lemma 5.3. If at any point in time the summation exceeds M = 8(t 2 /k ) = O (ε −1 t log n), then the algorithm stops and reports that m(U ) > 2t 2 . Otherwise, the algorithm returns the computed count k · Γ as the desired approximation. In both cases, we are correct with high probability by Lemma 3.2.
We now bound the number of IS queries. If the algorithm computed Γ by determining exact edge counts for m(U i ) for all i ∈ k , then the number of queries would be k Proof. The algorithm starts by checking if the number of edges in m(U ) is at most L base = O (ε −4 log 4 n) using the algorithm of Lemma 5.4. Otherwise, in the i th iteration, the algorithm sets t i = √ 2t i−1 , where t 0 = √ L base , and invokes the algorithm of Lemma 5.5 for t i as the threshold parameter. If the algorithm succeeds in approximating the right size, we are done. Otherwise, we continue to the next iteration. Taking a union bound over the iterations, we have that the algorithm stops with high probability before t α > 4 m(U ). Let α be the minimum value for which this holds. The number of IS queries performed by the algorithm is , since this is a geometric sum.

Shrinking Search.
We are given a graph G = ( n , E) and a set U ⊆ n . The task at hand is to approximate m(U ). Let N = |U |.
Given an oracle that can answer IS queries, we can decide if a specific edge uv exists in the set E (U ) by performing an IS query on {u, v}. We can treat such IS queries as membership oracle queries in the set E of edges in the graph, where the ground set is the set of all possible edges Z = U 2 = {ij | i < j and i, j ∈ U }, where |Z | = N (N − 1)/2. Invoking the algorithm of Lemma 2.8 in this case, with γ = 1/n O (1) , implies a (1 ± ε)-approximation to m(U ) using O ((N 2 /m(U ))ε −2 log n) IS queries. For our purposes, however, we need a budgeted version of this. Lemma 5.7. Given parameters t > 0, ξ ∈ (0, 1], and a set U ⊆ n , with N = |U |, an algorithm can either (a) return m(U ) ≤ N 2 /(2t ) or (b) return (1 ± ξ )-approximation to m(U ). The algorithm uses O (t log n) IS queries in case (a) and O (tξ −2 log n) in case (b). The returned result is correct with high probability.
Proof. The idea is to use the sampling as done in Lemma 2.7, with д = N 2 /(16t ) and ε = 1/2 on the sets of edges E (U ) ⊆ U 2 . The sample R used is of size O ((N 2 /д) log n) = O (t log n), and we check for each one of the sampled edges if it is in the graph by using an IS query. If the returned estimate is at most д/2, then the algorithm returns that it is in case (a).
Otherwise, we invoke the algorithm of Lemma 2.7 again, with ε = ξ , to get the desired approximation, which is case (b). Combining the two bounds on the IS queries, we get that the i th iteration used O (t i ε −2 log 2 n) IS queries.

The Overall IS Search Algorithm.
The algorithm stopped in the i th iteration if t i ≥ √ m/2 or t i ≥ n 2 /m. In particular, for the stopping iteration I , we have t I = O (min( √ m, n 2 /m)). As such, the total number of IS queries in all iterations except the last one is bounded by O ( I i=1 t i ε −2 log 2 n) = O (t I ε −2 log 2 n). The stopping iteration uses O (t I ε −2 log 2 n) IS queries. Each bound holds with high probability, and a union bound implies the same for the final result. Corollary 5.9. For a graph G = ( n , E), with an access to G via IS queries, and a parameter ε > 0, one can (1 ± ε)-approximate m using O (ε −4 log 5 n + n 2/3 ε −2 log 2 n)IS queries.
Proof. Follows readily as min( √ m, n 2 /m) ≤ n 2/3 , for any value of m between 0 and n 2 .

Limitations of IS Queries
In this section, we discuss several ways in which IS queries seem more restricted than BIS queries.
Simulating degree queries with IS queries. A degree query can be simulated by O (log n) BIS queries (see Lemma 4.10). In contrast, here we provide a graph instance where Ω(n/deg(v)) IS queries are needed to simulate a degree query. In particular, we show that IS queries may be no better than edge existence queries for the task of degree estimation. Since it is easy to see that Ω(n/deg(v)) edge existence queries are needed to estimate deg(v), this lower bound also applies to IS queries.
For the lower bound instance, consider a graph that is a clique along with a separate vertex v whose neighbors are a subset of the clique. We claim that IS queries involving v are essentially equivalent to edge existence queries. Any edge existence query can be simulated by an IS query. However, any IS query on the union of v and at least two clique vertices will always detect a clique edge. Thus, the only informative IS queries involve exactly two vertices.
Coarse estimator with IS queries. It is natural to wonder if it is possible to replace the coarse estimator (Lemma 4.8) with an analogous algorithm that makes polylog(n) IS queries. This would immediately imply an algorithm making polylog(n)/ε 4 IS queries that estimates the number of edges. We do not know if this is possible, but one barrier is a graph consisting of a clique U on O ( √ m) vertices along with a set V of n − O ( √ m) isolated vertices. We claim that for this graph, the algorithm CoarseEstimator(U , V ) from Section 4.2, using IS queries instead of BIS queries, will output an estimate m that differs from m by a factor of Θ(n 1/3 ). Consider the execution of CheckEstimate(U , V , e) from Algorithm 4.1. A natural way to simulate this with IS queries would be to use an IS query on U ∪ V instead of a BIS query on (U , V ). Assume for the sake of argument that m = n 4/3 and |U | = √ m = n 2/3 . Consider when the estimate e satisfies e = cn 5/3 for a small constant c. In the CheckEstimate execution, there will be a value i = Θ(log n) such that with constant probability, U ⊆ U will contain at least two vertices and V ⊆ V will contain at least one vertex. In this case, m(U ∪ V ) 0 even though m(U , V ) = 0. Thus, using IS queries will lead to incorrectly accepting on such a sample, and this would lead to the CoarseEstimator outputting the estimate e = Θ(n 5/3 ) even though the true number of edges is m = n 4/3 .

CONCLUSION
In this article, we explored the task of using either BIS or IS queries to estimate the number of edges in a graph. We presented randomized algorithms giving a (1 + ε)-approximation using polylog(n)/ε 4 BIS queries and min{n 2 /(ε 2 m), √ m/ε} polylog(n)IS queries. Our algorithms estimate the number of edges by first sparsifying the original graph and then exactly counting edges spanning certain bipartite subgraphs. Next we describe a few open directions for future research.

Open Directions
Open questions include using a polylogarithmic number of BIS queries to estimate the number of cliques in a graph (see the work of Eden et al. [20] for an algorithm using degree, neighbor, and edge existence queries) or to sample a uniformly random edge (see the work of Eden and Rosenbaum [21] for an algorithm using degree, neighbor, and edge existence queries). In general, any graph estimation problems may benefit from BIS or IS queries, possibly in combination with standard queries (e.g., neighbor queries). Finally, it would be interesting to know what other oracles, besides subset queries, enable estimating graph parameters with a polylogarithmic number of queries.