On Triangle Estimation Using Tripartite Independent Set Queries

Estimating the number of triangles in a graph is one of the most fundamental problems in sublinear algorithms. In this work, we provide an algorithm that approximately counts the number of triangles in a graph using only polylogarithmic queries when the number of triangles on any edge in the graph is polylogarithmically bounded. Our query oracle Tripartite Independent Set (TIS) takes three disjoint sets of vertices A, B and C as inputs, and answers whether there exists a triangle having one endpoint in each of these three sets. Our query model generally belongs to the class of group queries (Ron and Tsur ACM Trans. Comput. Theory 8(4), 15, 2016; Dell and Lapinskas 2018) and in particular is inspired by the Bipartite Independent Set (BIS) query oracle of Beame et al. (2018). We extend the algorithmic framework of Beame et al., with TIS replacing BIS, for approximately counting triangles in graphs.


Notations, the Query Model, the Problem and the Result
We denote the set {1, . . ., n} by [n].Let V (G), E(G) and T (G) denote the set of vertices, edges and triangles in the input graph G, respectively.When the graph G is explicit, we may write only V , E and T for the set of vertices, edges and triangles.Let  A, B, C).We use the triplet (a, b, c) to denote the triangle having a, b, c as its vertices.Let u denote the number of triangles having u as one of its vertices.Let (u,v) be the number of triangles having (u, v) as one of its edges and (u,v) .For a set U , "U is COLORED with [n]", means that each member of U is assigned a color out of [n] colors independently and uniformly at random.Let E[X] and V[X] denote the expectation and variance of a random variable X.For an event E, E c denotes the complement of E. The statement "a is an (1 ± )-multiplicative approximation of b" means |b − a| ≤ • b.Next, we describe the query oracle.
Definition 1 (Tripartite independent set oracle (TIS)) Given three non-empty disjoint subsets V 1 , V 2 , V 3 ⊆ V (G) of a graph G, TIS query oracle answers 'YES' if and only if Notice that the query oracle looks at only those triangles that have vertices in all of these sets V 1 , V 2 , V 3 .The TRIANGLE-ESTIMATION problem is to report an (1 ± )multiplicative approximation of t (G) where the input is V (G), TIS oracle for graph G and ∈ (0, 1).
Theorem 2 (Main Result) Let G be a graph with E ≤ d, |V (G)| = n ≥ 64.For any > 0, TRIANGLE-ESTIMATION can be solved using O d 2 log 18 n 4 many TIS queries with probability at least 1 − O (1)  n 2 .
Note that the query complexity stated in Theorem 2 is poly(log n, 1 ), even if d is O(log c n), where c is a positive constant.We reiterate that the only bound we require is on the number of triangles on an edge; neither do we require any bound on the maximum degree of the graph, nor do we require any bound on the number of triangles incident on a vertex.

Query Models and TIS
Query models for graphs are essentially of two types: Local Queries, and Group (and related) queries.
Local Queries This query model was initiated by Feige [17] and Goldreich and Ron [18] and used even recently by [15,16].The queries on the graphs are (i) Degree query: the oracle reports the degree of a vertex; (ii) Neighbor query: the oracle reports the ith neighbor of v, if it exists; and (iii) Edge existence query: the oracle reports whether there exists an edge between a given pair of vertices.
Group Queries or Subset Queries and Subset Samples These queries were implicitly initiated in the works of Stockmeyer [29,30] and formalized by Ron and Tsur [28].
Group queries can be viewed as a generalization of membership queries in sets.The essential idea of the group queries is to estimate the size of an unknown set S ⊆ U by using a YES/NO answer from the oracle to the existence of an intersection between sets S and T ⊆ U ; and give a uniformly selected item of S ∩ T , if S ∩ T = ∅ in the subset sample query.Subset sample queries are at least as powerful as group queries.
The cut query by Rubinstein et al. [27], though motivated by submodular function minimization problem, can also be seen in the light of group queries-we seek the number of edges that intersect both the vertex sets that form a cut.Choi and Kim [11] used a variation of group queries for graph reconstruction.Dell and Lapinskas [12] essentially used this class of queries for estimating the number of edges in a bipartite graph.Bipartite independent set (BIS) queries for a graph, initiated by Beame et al. [7], can also be seen in the light of group queries.It provides a YES/NO answer to the existence of an edge in where V 1 and V 2 are disjoint.A subset sample version of BIS oracle was used in [6].
In TIS, we seek a YES/NO answer about the existence of an intersection between the set of triangles, that we want to estimate, and three disjoint sets of vertices.Thus TIS belongs to the class of group queries, as does BIS.A bone of contention for any newly introduced query oracle is its worth. 1 Beame et al. [7] had given a subjective justification in favor of BIS to establish it as a query oracle.It is easy to verify that TIS, being in the same class of group queries, have the interesting connections to group testing and computational geometry as BIS.We provide justifications in favor of considering E ≤ d in Appendix A. Intuitively, TIS is to triangle counting what BIS is to edge estimation.

Prior Works
Eden et al. [15] showed that query complexity of estimating the number of triangles in a graph G using local queries is 2 Matching upper and lower bounds on k-clique counting in G using local query model have also been reported [16].These results have almost closed the k-clique counting problem in graphs using local queries.A precursor to triangle estimation in graphs is edge estimation.The number of edges in a graph G can be estimated by using many degree and neighbor queries, and queries are necessary to estimate the number of edges even if we allow all the three local queries [18].This result would almost have closed the edge estimation problem but for having a relook at the problem with stronger query models and hoping for polylogarithmic number of queries.Beame et al. [7] precisely did that by estimating the number of edges in a graph using O log 14 n 4 bipartite independent set (BIS) queries.Motivated by this result, we explore whether triangle estimation can be solved using only polylogarithmic TIS queries.
Note that the TRIANGLE-ESTIMATION can also be thought of as HYPEREDGE ESTIMATION problem in a 3-uniform hypergrah.As a follow up to this paper, Dell et al. [13] and Bhattacharya et al. [4], independently, generalized our result to cuniform hypergraphs, where c ∈ N is a constant.Their result showed that the bound on E is not necessary to solve TRIANGLE-ESTIMATION by using polylogarithmic many TIS queries.

Organization of the Paper
We give a broad overview of the algorithm in Section 2. Section 3 gives the details of sparsification.In Section 4, we give exact/approximate estimation algorithm with respect to a threshold.Section 5 discusses about teh algorithm for coarse estimation of the number of triangles.The final algorithm is given in Section 6. Section 7 concludes the paper with some discussions about future improvements.Appendix A provides justifications in favor of TIS.Appendix B has the probabilistic results used in this paper.

Overview of the Algorithm
Our algorithmic framework is inspired by [7] but the detailed analysis is markedly different, like the use of a relatively new concentration inequality, due to Janson [21], for handling sums of random variables with bounded dependency.Apart from Lemmas 6 and 9, all other proofs require different ideas.
In Fig. 1, we give a flowchart of the algorithm and show the corresponding lemmas that support the steps of the algorithm.The main idea of our algorithm is as follows.We can figure out for a given G, if the number of triangles t (G) is greater than a threshold τ ((Lemma 4)).If t (G) ≤ τ , i.e., G is sparse in triangles, we compute an (1 ± )-approximation of t (G) (Lemma 4).Otherwise, we sparsify G to get a disjoint union of tripartite subgraphs of G that maintain t (G) up to a scaling factor (Lemma 3).For each tripartite subgraph, if the subgraph is sparse (decided by Lemma 5), we count the number of triangles exactly (Lemma 6).Otherwise, we again sparsify (Lemma 7).This repeated process of sparsification may create a huge number of tripartite subgraphs.Counting the number of triangles in them is managed by doing a coarse estimation (Lemma 9) and taking a sample of the subgraph that maintains the number of triangles approximately.Each time we sparsify, we ensure Fig. 1 Flow chart of the algorithm.The highlighted texts indicate the basic building blocks of the algorithm.We also indicate the corresponding lemmas that support the building blocks that the sum of the number of triangles in the subgraphs generated by sparsification is a constant fraction of the number of triangles in the graph before sparsification, making the number of iterations O(log n).
We sparsify G by considering the partition obtained when V (G) is COLORED with [3k].This sparsification is done such that: (i) the sparsified graph is a union of a set of vertex disjoint tripartite subgraphs and (ii) a proper scaling of the number of triangles in the sparsified graph is a good estimate of t (G) with high probability. 3The proof of the sparsification result stated next uses the method of averaged bounded differences and Chernoff-Hoeffding type inequality in bounded dependency setting by Janson [21].The detailed proof is in Section 3. Recall that E is the maximum number of triangles on a particular edge.

Lemma 3 (General Sparsification) Let k, d ∈ N. There exists a constant κ 1 such that for any graph G with
We apply the sparsification corresponding to Lemma 3 only when t (G) is above a threshold 4 to ensure that the relative error is bounded.We can decide whether t (G) is at most the threshold and if it is so, we estimate the value of t (G), using the following lemma, whose proof is given in Section 4.
Lemma 4 (Estimation with respect to a threshold) There exists an algorithm that for any graph G, a threshold parameter τ ∈ N and an ∈ (0, 1), determines whether t (G) > τ .If t (G) ≤ τ , the algorithm gives a (1 ± )-approximation to t (G) by using O( τ log 2 n 2 ) many TIS queries with probability at least 1 − n −10 .
Assume that t (G) is large 5 and G has undergone sparsification.We initialize a data structure with a set of vertex disjoint tripartite graphs that are obtained after the sparsification step.For each tripartite graph G(A, B, C) in the data structure, we check whether t (A, B, C) is less than a threshold using the algorithm corresponding to Lemma 5.If it is less than a threshold, we compute the exact value of t (A, B, C) using Lemma 6 and remove G(A, B, C) from the data structure.The proofs of Lemmas 5 and 6 are given in Section 4.

Lemma 5 (Threshold for Tripartite Graph)
There exists a deterministic algorithm that given any disjoint subsets A, B, C ⊂ V (G) of any graph G and a threshold parameter τ ∈ N, can decide whether t (A, B, C) ≤ τ using O(τ log n) TIS queries.

Lemma 6 (Exact Counting in Tripartite Graphs) There exists a deterministic algorithm that given any disjoint subsets A, B, C ⊂ V (G) of any graph G, can determine the exact value of t (A, B, C) using O(t (A, B, C) log n) TIS queries.
Now we are left with some tripartite graphs such that the number of triangles in each graph is more than a threshold.If the number of such graphs is not large, then we sparsify each tripartite graph G(A, B, C) in a fashion almost similar to the earlier sparsification.This sparsification result formally stated in the following Lemma, has a proof similar to Lemma 3. We replace G(A, B, C) by a constant (say, k) 6 many tripartite subgraphs formed after sparsification.Lemma 7 (Sparsification for Tripartite Graphs) Let k, d ∈ N.There exists a constant κ 2 such that where A, B and C are disjoint subsets of V (G) for any graph G with E ≤ d, and A 1 , . . ., A k , B 1 , . . ., B k and C 1 , . . ., C k are the partitions of A, B, C formed uniformly at random, respectively.
If we have a large number of vertex disjoint tripartite subgraphs of G and each subgraph contains a large number of triangles, then we coarsely estimate the number of triangles in each subgraph which is correct up to O(log 3 n) factor by using the algorithm corresponding to the following Lemma, whose proof is in Section 5. Our COARSE-ESTIMATE algorithm is similar in structure to the coarse estimation algorithm for edge estimation, but requires a more careful analysis.

Lemma 8 (Coarse Estimation) There exists an algorithm that given disjoint subsets A, B, C ⊂ V (G) of any graph G, returns an estimate t satisfying t (A, B, C)
with probability at least 1 − n −9 .Moreover, the query complexity of the algorithm is After estimating the number of triangles in each subgraph coarsely, we approximately maintain the triangle count using the following sampling result which is a direct consequence of the Importance Sampling Lemma of [7]. 7  Lemma 9 ([7]) Let (A 1 , B 1 , C 1 , w 1 ), . . ., (A r , B r , C r , w r ) be the tuples present in the data structure and e i be the corresponding coarse estimation for , (A s , B s , C s , w s ) such that all of the above three conditions hold and 7 For the exact statement of the Importance Sampling Lemma see Lemma 29 in Appendix B.

Note that the exact values t (A i , B i , C i )'s are not known to us. Then there exists an algorithm that finds
with probability at least 1−δ, where S = r i=1 w i • t (A i , B i , C i ) and λ, δ > 0. Also, Now again, for each tripartite graph G(A, B, C), we check whether t (A, B, C) is less than a threshold using the algorithm corresponding to Lemma 5.If yes, then we can compute the exact value of t (A, B, C) using Lemma 6 and remove G(A, B, C) from the data structure.Otherwise, we iterate on all the required steps discussed above as shown in Fig. 1.Observe that each iteration uses polylogarithmic8 many queries.Now, note that the number of triangles reduces by a constant factor after each sparsification step.So, the number of iterations is bounded by O(log n).Hence, the query complexity of our algorithm is polylogarithmic.This completes the high level description of our algorithm.

Sparsification Step
In this Section, we prove Lemma 3. The proof of Lemma 7 is similar.
. ., V 3k be the resulting partition of V (G).Let Z i be the random variable that denotes the color assigned to the i th vertex.For i ∈ [3k], π(i) is a set of three colors defined as follows: Definition 11 A triangle (a, b, c) is said to be properly colored if there exists a bijection in terms of coloring from {a, b, c} to π(i).
Note that f is the number of triangles that are properly colored.The probability that a triangle is properly colored is Let us focus on the instance when vertices 1, . . ., t − 1 are already colored and we are going to color vertex t.Let S (S r ) be the set of triangles in G having t as one of the vertices and other two vertices are from ). S r be the set of triangles in G such that t is a vertex and the second and third vertices are from [t − 1] and [n] \ [t], respectively.
Given that the vertex t is colored with color c ∈ [3k], let N c , N c r , N c r be the random variables that denote the number of triangles in S , S r and S r that are properly colored, respectively.Also, let E t f denote the absolute difference in the conditional expectation of the number of triangles that are properly colored whose t th -vertex is (possibly) differently colored.By considering the vertices in S , S r and S r separately, we can bound Now, consider the following claim, which we prove later. 9  Let c t = 15d t log n.From the above claim, we have Using the method of averaged bounded difference [14] (See Lemma 24 in Appendix B), we have To finish the proof of Lemma 3, we need to prove Claim 12.For that, we need the following definition and intermediate result (Lemma 14) that is stated in terms of objects, which in the current context can be thought of as vertices.
Definition 13 Let X be a set of u objects COLORED with [3k].Let α, β ∈ [3k] and α = β.A pair of objects {a, b} is said to be colored with {α, β} if there is a bijection in terms of coloring from {a, b} to {α, β}.An object o ∈ X is colored with {α, β} if o is colored with α or β.
Recall Definition 11.A triangle incident on t is properly colored if the pair of vertices in the triangle other than t, is colored with π(Z t ) \ {Z t }.Note that, Claim 12 bounds the difference in the number of properly colored triangles incident on t when Z t = a t and when Z t = a t , that is, the difference in the number of triangles whose pair of vertices other than t is colored with π(a t ) \ {a t } and that is colored with π(a t )\{a t }.As, a vertex can be present in many pairs, proper coloring of one triangle, incident on t, is dependent on the porper coloring of another triangle.However, this dependency is bounded due to our assumption E ≤ d.Now, let us consider the following Lemma.Lemma 14 Let X be a set of u objects COLORED with [3k].F be a set of v pairs of objects such that an object is present in at most d (d ≤ v) many pairs and P ⊆ X be a set of w objects.F {α,β} ⊆ F be a set of pairs of objects that are colored with {α, β}.M {α,β} = F {α,β} .P {α,β} ⊆ P be the set of objects that are colored with {α, β} and N {α,β} = P {α,β} .Then, we have X i and X j are dependent if and only if {a i , b i }∩{a j , b j } = ∅.As each object can be present in at most d many pairs of objects, there are at most 2d many X j 's on which an X i depends.Now using Chernoff-Hoeffding's type bound in the bounded dependent setting [14] (see Lemma 28 in Appendix B), we have Similarly, one can also show that Hence, So, we need to bound E[|X|] to prove the claim.The random variables X i and X j are dependent if and only if {a i , b i } ∩ {a j , b j } = ∅.As each object can be present in at most d many pairs of objects, there are at most 2d many X j 's on which an X i depends.Observe that P( ] and recalling the fact that each X i depends on at most 2d many other X j 's, we get X i .Observe that E[X i ] = 2 3k and hence, E N {α,β} = 2w 3k .Note that X i and X j are independent.Applying Hoeffding's inequality (See Lemma 25 in Appendix B), we get Similarly, we can aso show that P N {α ,β } − 2w 3k ≥ 2 w log u ≤ 2 u 8 .Hence, We will now give the proof of Claim 12.

Proof of
Let E be the event that Q a t − Q a t ≥ 4 w log n.So, P(E) ≤ 4 n 8 .Assume that E has not occurred.Let P = P a t ∩ P a t = {(x 1 , t, y 1 ), . . ., (x q , t, y q )}.Note that q ≤ w ≤ t .Recall that Z x is the random variable that denotes the color assigned to vertex x ∈ [n].Let X i , i ∈ [q], be the random variable such that Observe that X i and X j are dependent if and only if y i = y j .As E ≤ d, there can be at most d many y j 's such that y i = y j .So, an X i depends on at most d many other X j 's.
Observe that P( ] and the fact that each X i depends on at most d many other X j 's, we get

Estimation: Exact and Approximate
In this Section, we prove Lemmas 4 (restated as Lemma 17), 5 (restated as Lemma 16) and 6 (restated as Lemma 15).We first prove Lemmas 5 and 6, whose proofs are very similar.Then we prove Lemma 4 that in turn uses Lemma 5.

Lemma 15 (Lemma 6 restated) There exists a deterministic algorithm that given any disjoint subsets A, B, C ⊂ V (G) of any graph G, can determine the exact value of t (A, B, C) using O(t (A, B, C) log n) TIS queries.
Proof We initialize a tree T with (A, B, C) as the root.We build the tree such that each node is labeled with either 0 or 1.If t (A, B, C) = 0, we label the root with 0 and terminate.Otherwise, we label the root with 1 and do the following as long as there is a leaf node (U, V , W ) labeled with 1.
(i) If t (U, V , W ) = 0, then we label (U, V , W ) with 0 and go to other leaf node labeled as 1 if any.Otherwise, we label (U, V , W ) as 1 and do the following.
and U 2 ; V 1 and V 2 ; W 1 and W 2 , respectively, such that Let T be the tree after deleting all the leaf nodes in T .Observe that t (A, B, C) is the number of leaf nodes in T ; and • the query complexity of the above procedure is bounded by the number of nodes in T as we make at most one query per node of T .
The number of nodes in Proof We show that Algorithm 1 satisfies the given condition in the statement of Lemma 5. Note that THRESHOLD-APPROX-ESTIMATE calls the algorithm corresponding to Lemma 5 at most N = 18 log n 2 times, where each call can be executed by O(τ log n) TIS queries.So, the total query complexity of THRESHOLD-APPROX- Now, we show the correctness of THRESHOLD-APPROX-ESTIMATE. If there exists an i ∈ [N ], such that t (A i , B i , C i ) > τ, then we report t (G) > τ and QUIT.Otherwise, by Lemma 5, we have the exact values of t (A i , B i , C i )'s.We will be done by showing that t is an (1 ± )-approximation to t (G) with probability at least 1 − n −10 .From the description of the algorithm, each triangle in G will be counted in t (A i , B i , C i ) with probability 2  9 .We have , and the expectation of the sum and the estimate t is Therefore, we have To bound the above probability, we apply Hoeffding's inequality (See Lemma 25 in Appendix B) along with the fact that 0 ≤ t (A i , B i , C i ) ≤ τ for all i ∈ [N ], and we get

Coarse Estimation
We now prove Lemma As X ij ≥ 0, Now using the fact that t ≥ 64t (A, B, C) log 3 n, we have . Observe that VERIFY-ESTIMATE accepts if and only if there exists i, j ∈ {0, . . ., log n} such that X ij = 0. Using the union bound, we get Proof For p ∈ {0, . . ., 2 log n}, let A p ⊆ A be the set of vertices such that for each a ∈ A p , the number of triangles of the form (a, b, c) with (b, c) ∈ B × C , lies between 2 p and 2 p+1 − 1.For a ∈ A p and q ∈ {0, . . ., log n}, let B pq (a) ⊆ B is the set of vertices such that for each b ∈ B, the number of triangles of the form (a, b, c) with c ∈ C lies between 2 q and 2 q+1 − 1 We need the following Claim to proceed further.A p , B, C) as the sum takes into account all incidences of vertices in A. So, there exists p ∈ {0, . . ., 2 log n} such that t (A p , B, C) ≥ t (A,B,C) 2 log n+1 .From the definition of A p , t (A p , B, C) < |A p | • 2 p+1 .Hence, there exists p ∈ {0, . . ., 2 log n} such that
We come back to the proof of Lemma 19.We will show that VERIFY-ESTIMATE accepts with probability at least 1  5 when loop executes for i = p, where p is such that The existence of such a p is evident from Claim 20 (i).Recall that A pq ⊆ A, B pq ⊆ B and C pq ⊆ C are the samples obtained when the loop variables i and j in Algorithm 2 attain values p and q, respectively.Observe that Now using the fact that t ≤ t (A,B,C) 32 log n and n ≥ 64, Assume that A pq ∩ A p = ∅ and a ∈ A pq ∩ A p .By Claim 20 (ii), there exists q ∈ {0, . . ., log n}, such that B pq (a) ≥ 2 p 2 q+1 (log n+1) .Note that q depends on a. Observe that we will be done, if we can show that VERIFY-ESTIMATE accepts when loop executes for i = p and j = q.Now, Assume that A pq ∩ A p = ∅, B pq ∩ B pq (a) = ∅ and b ∈ B pq ∩ B pq (a).Let S be the set such that (a, b, s) is a triangle in G for each s ∈ S. Note that |S| ≥ 2 q .So, Observe that VERIFY-ESTIMATE accepts if t (A pq , B pq , C pq ) = 0. Also, t (A pq , B pq , C pq ) = 0 if A pq ∩A p = ∅, B pq ∩B pq (a) = ∅ and C pq ∩S = ∅.Hence, . .We build a data structure such that it maintains two things at any point of time.

Lemma 21 (Lemma 9 restated) There exists an algorithm that given disjoint subsets
(i) An accumulator ψ for the number of triangles.We initialize ψ = 0. (ii) A set of tuples (A 1 , B Before discussing the steps of our algorithm, some remarks about our sparsification lemmas (Lemmas 3 and 7) are in order.
In our algorithm, we apply Lemma 3 for k = 1.Also, we require λ = 6 log n .
So, Lemma 3 gives useful result in our algorithm when t (G) ≥ sample from the set of tuples such that the sample maintains the required estimate approximately by using Lemma 9. We use the algorithm corresponding to Lemma 9 with λ = 6 log n , ρ = 64 log 2 n and δ = for some constant κ 3 > 0. This κ 3 is same as the one mentioned in Step 3. Also, note that, No query is required to execute the algorithm of Lemma 9. Recall that the number of tuples present at any time is O (N).Also, the coarse estimation for each tuple can be done by using O(log 4 n) many queries (Lemma 9).Hence, the number of queries in this step in each iteration, is O(N • log 4 n).
Step 7: (Sparsification for Tripartite Graphs) We partition each of A, B and C into 3 parts uniformly at random.
We delete (A, B, C, w) from the data structure and add (U i , V i , W i , 9w) for each i ∈ [3] to our data structure.Note that no query is made in this step.
Step 8: Report ψ as the estimate for the number of triangles in G, when no tuples are left.
First, we prove that the above algorithm produces a (1 ± ) multiplicative approximation to t (G) for any > 0 with high probability.Recall the description of Step 1 of the algorithm.If the algorithm terminates in Step 1, then we have a (1 ± ) approximation to t (G) by Lemma 4. Otherwise, we decide that t (G) > τ and proceed to Step 2. In Step 2, the algorithm colors V (G) using three colors and incurs a multiplicative error of 1 ± 0 to t (G), where 0 = κ 1 d log n √ t (G) .This is because of Remark 1 and our choice of τ .As t (G) > τ and n ≥ 64, 0 ≤ λ = 6 log n .Note that the algorithm possibly performs Step 4 to Step 7 multiple times, but not more than O(log n) times, as explained below.
Let (A 1 , B  .So, after 3 log n many iterations there will be at most constant number of active triangles and then we can compute the exact number of active triangles and add it to ψ.In each iteration, there can be a multiplicative error of 1 ± λ in Step 5 and 1 ± 0 due to Step 4. So, using the fact that 0 ≤ λ, the multiplicative approximation factor lies between (1 − λ) 3 log n+1 and (1 + λ) In the above expression, we have put τ = max 36κ

Discussions
In this work, we generalize the framework of Beame et al. [7] of EDGE ESTIMA-TION to solve TRIANGLE-ESTIMATION by using TIS queries.Our algorithm makes O( −4 d 2 log 18 n) many TIS queries and returns a (1 ± )-approximation to the number of triangles with high probability, where d is the upper bound on E .The downside of our work is the assumption E ≤ d.Note that Beame et al. [7] had no such assumtion.Removing the assumption is non-trivial mainly due to the fact that, unlike the case for edges where two edges can share a common vertex, two triangles can share an edge.Our sparsification algorithm crucially uses the assumption on E and that remains the main barrier to cross.Recall our sparsification lemma (Lemma 3) and the definition of properly colored triangles (Definition 11).Roughly speaking, our sparsification algorithm first colors the vertices of the graph, then counts the number of properly colored triangles, and finally scales it to have an estimation of the total number of triangles in the graph.Consider the situation when all the triangles in the graph have a common edge e.If e is not properly colored, then we can not keep track of any triangle in G.As a follow up to this paper, Dell et al. [13] and Bhattacharya et al. [4], independently, generalized our result to c-uniform hypergraphs, where c ∈ N is a constant.In Section 1, we already noted that TRIANGLE-ESTIMATION can also be thought of as HYPEREDGE ESTIMATION problem in a 3-uniform hypergrah.Their results showed that the bound on E is not necessary to solve TRIANGLE-ESTIMATION by using polylogarithmic many TIS queries.The main technical result in both the papers is to come up with a sparsification algorithm that can take care of the case when E is not necessarily bounded.Note the sparsification algorithms in both the papers are completely different and give different insights.Bhattacharya et al. [4] and Dell et al. [13] refer the generalized oracle as GEN-ERALISED PARTITE INDEPENDENT SET (GPIS) oracle and COLORFUL DECISION ( CD) oracle, respectively.Bhattacharya et al. [4] showed that HYPEREDGE ESTI-MATION can be solved by using O c −4 log 5c+5 n many GPIS queries and Dell et al. [13] showed that it can be solved by using O c −2 log 4c+8 n many CD queries, 11 with high probability.Substituting c = 3 in their algorithm, we can have two different algorithms for TRIANGLE-ESTIMATION.Let us compare our result (stated in Theorem 22) with the results of [4] and Dell et al. [13] in the context of TRIANGLE-ESTIMATION.If E = o(log n), our algorithm for TRIANGLE-ESTIMATION have less query complexity than that of Bhattacharya et al. [4] for any given > 0. Also, when E = o(log n) and > 0 is a fixed constant, our algorithm for TRIANGLE-ESTIMATION have less query complexity than that of Dell et al. [13].
sharing information among their neighbors in G. Observe that the information of a node derived by the set of neighbors.So, if two nodes have large number of common neighbors in G, then there is no need of an edge between the two nodes.So, the number of triangles on any edge in the graph is bounded.The objective is to compute the number of triangles in G, that is, the number of triples of nodes in G such that each pair of vertices are connected.
In (i) and (ii), TIS oracle can be implemented very efficiently.We can report a TIS query by just running a standard plane sweep algorithm in Computational Geometry that takes O(n log n) running time.
The following lemma directly follows from Lemma 27.
Lemma 28 Let X 1 , . . ., X n be indicator random variables such that there are at most d many X j 's on which an X i depends and X = .
t (G) = |T (G)|.The statement A, B, C are disjoint, means A, B, C are pairwise disjoint.For three non-empty disjoint sets A, B, C ⊆ V (G), G(A, B, C), termed as a tripartite subgraph of G, denotes the induced subgraph of A ∪ B ∪ C in G minus the edges having both endpoints in A or B or C. t (A, B, C) denotes the number of triangles in G(

1 20 .
Proof Let T (A, B, C) denote the set of triangles having vertices a ∈ A, b ∈ B and c ∈ C, where A, B and C are disjoint subsets of V (G).For (a, b, c) ∈ T (A, B, C) such that a ∈ A, b ∈ B, c ∈ C, let X ij (a,b,c) denote the indicator random variable such that X ij (a,b,c) = 1 if and only if (a, b, c) ∈ T (A ij , B ij , C ij ) and X ij = 1180 Theory of Computing Systems (2021) 65:1165-1192

2
Then for any δ > 0,P(|X − E[X]| ≥ δ) ≤ 2e −2δ 2 /(d+1)n .Lemma 29 (Importance sampling[7]) Let (D 1 , w 1 , e 1 ), . . ., (D r , w r , e r ) are the given structures and each D i has an associated weight c(D i ) satisfying(i) w i , e i ≥ 1, ∀i ∈ [r]; (ii) e i ρ ≤ c(D i ) ≤ e i ρfor some ρ > 0 and all i ∈ [r]; and(iii)r i=1 w i • c(D i ) ≤ M.Note that the exact values c(D i )'s are not known to us.Then there exists an algorithm that finds (D 1 , w 1 , e 1 ), . . ., (D s , w s , e s ) such that, with probability at least 1 − δ, all of the above three conditions hold and t i=1 w i • c(D i ) − r i=1 w i • c(D i ) ≤ λS, where S = r i=1 w i • c(D i ) and λ, δ > 0. The time complexity of the algorithm is O(r) and s = O ρ 4 log M log log M+log 1 δ λ 2 o w } be the set of w objects.Let X i , i ∈ [w], be the indicator random variable such that X i = 1 if and ony if o i is colored with {α, β}.Note that N {α,β} = w i=1 Now we apply Lemma 14. Set X = [n] and F = S in Lemma 14. Observe that N a t = M π(a t )\{a t } and N a t ≥ 8 d t log n ≤ 4 n 8 .(b) Let S r = {(t, a 1 , b 1 ), . . ., (t, a v , b v )}.Note that v ≤ t , the number of triangles incident on vertex t.As E ≤ d, each vertex in [n] can be present in at most d many pairs of S r .Now we apply Lemma 14. Set X = [n] and F = S r in Lemma 14. Observe that N a t r = M π(a t )\{a t } and N Let S r = {(a 1 , t, b 1 ), . . ., (a w , t, b w )}.Without loss of generality, assume that a i ∈ [t − 1] and b i ∈ [n] \ [t].Note that w ≤ t .Given that the vertex t is colored with color c and we know Z 1 , . . ., Z t−1 , define the set P c as Now we apply Lemma 14. Set X = [n], P = {a 1 , . . ., a w }.Observe that P π(a t )\a t = P a t and P π(a t )\a t = P a t .By (iii) of Lemma 14, Claim 12 (a) Let S = {(a 1 , b 1 , t), . .., (a v , b v , t)}.Note that v ≤ t .As E ≤ d,each vertex in [n] can be present in at most d many pairs of S .t = M π(a t )\{a t } .So, by of Lemma 14 (i), P N a t − N a t ≥ 8 dv log n ≤ 4 n 8 .This implies P N a t − N a P c := {(a, t, b) ∈ S r : t is colored with c and P((a, t, b) is properly colored) > 0}.Let Q c = |P c |. Observe that for (a, t, b) ∈ S r , P((a, t, b) is properly colored) > 0 if and only if a is colored with some color in π(c)\{c}.
log n many queries and the number of nodes in T is bounded by 16t (A, B, C) log n.So, if the number of nodes in T is more than 16τ log n at any instance during the execution of the algorithm, we report t (G) > τ and terminate.Hence, the query complexity is bounded by the number of nodes in T, the number of internal nodes of T , is bounded by 2t (A, B, C) log n.So, the number of leaf nodes in T is at most 16t (A, B, C) log n and hence the total number of nodes in T is at most 16t (U, V , W ) log n.Putting everything together, the required query complexity isO(t (A, B, C) log n).Proof The algorithm proceeds similar to the one presented in the Proof of Lemma 6 by initializing a tree T with (A, B, C) as the root.If t (A, B, C) ≤ τ , then we can find t (A, B, C) by using 16t (A, B, C) T , which is O(τ log n).Lemma 17 (Lemma 4 Restated)There exists an algorithm that for any graph G, a threshold parameter τ ∈ N and an ∈ (0, 1), determines whether t (G) > τ .If t (G) ≤ τ , the algorithm gives a (1 ± )-approximation to t (G) by using O( τ log 2 n 2 ) many TIS queries with probability at least 1 − n −10 .
9. Algorithm 3 corresponds to Lemma 9. Algorithm 2 is a subroutine in Algorithm 3. Algorithm 2 determines whether a given estimate t is correct upto a O(log 2 n) factor.Lemmas 18 and 19 are intermediate results needed to prove Lemma 9.

6 The Final Triangle Estimation Algorithm: Proof of Theorem 2
of any graph G, returns an estimate t satisfying Proof Note that an execution of COARSE-ESTIMATE for a particular t, repeats VERIFY-ESTIMATE for = 2000 log n times and gives output t if at least 10 many VERIFY-ESTIMATE accepts.For a particular t, let X i be the indicator random variable such that X i = 1 if and only if the ith execution of VERIFY-ESTIMATE accepts.Also take X = i=1 X i .COARSE-ESTIMATE gives output t if X > 10 .From the description of VERIFY-ESTIMATE and COARSE-ESTIMATE, the query complexity of VERIFY-ESTIMATE is O(log 2 n) and COARSE-ESTIMATE calls VERIFY-ESTIMATE O(log 2 n) times.Hence, COARSE-ESTIMATE makes O(log 4 n) many queries.Now we design an algorithm for (1 ± )-multiplicative approximation of t (G).
1 , C 1 , w 1 ), . .., (A ζ , B ζ , C ζ , w ζ ), where tuple (A i , B i , C i ) corresponds to the tripartite subgraph G(A i , B i , C i ) and w i is the weight associated to G(A i , B i , C i ).Initially, there is no tuple in our data structure.
1 n 10 to find a new set of tuples (A 1 , B 1 , C 1 , w 1 ), . . ., (A s , B s , C s , w s ) such that 1 , C 1 , w 1 ), . . ., (A ζ , B ζ , C ζ , w ζ ) are the set of tuples present in the data structure currently.We define ζ i=1 t (A i , B i , C i ) as the number of active triangles.Let ACTIVE i be the number of triangles that are active in the ith iteration.Note that ACTIVE 1 ≤ t (G) ≤ n 3 .By Lemma 7 and Step 7, observe that ACTIVE i+1 ≤ ACTIVE i 3 log n+1 .As λ = 6 log n , the required approximation factor is 1 ± .The query complexity of Step 1 is O τ log n 2 .Steps 2, 3, 5, 7 and 8 do not make any query to the oracle.The query complexity of Step 4 is O (τ N log n) in each iteration and that of Step 6 is O(N log 4 n) in each iteration.The total number of iterations is O(log n).Hence, the total query complexity of the algorithm is Theorem 22 (Restatement of Theorem 2) Let G be a graph with E ≤ d, |V (G)| = n ≥ 64.For any > 0, TRIANGLE-ESTIMATION can be solved using O d 2 log 18 n