Cycle Counting Under Local Differential Privacy for Degeneracy-Bounded Graphs
Abstract
We propose an algorithm for counting the number of cycles under local differential privacy for degeneracy-bounded input graphs. Numerous studies have focused on counting the number of triangles under the privacy notion, demonstrating that the expected -error of these algorithms is , where is the number of nodes in the graph. When parameterized by the number of cycles of length four (), the best existing triangle counting algorithm has an error of . In this paper, we introduce an algorithm with an expected -error of , where is the degeneracy and is the maximum degree of the graph. For degeneracy-bounded graphs () commonly found in practical social networks, our algorithm achieves an expected -error of . Our algorithm’s core idea is a precise count of triangles following a preprocessing step that approximately sorts the degree of all nodes. This approach can be extended to approximate the number of cycles of length , maintaining a similar -error, namely or for degeneracy-bounded graphs.
Keywords and phrases:
Differential privacy, triangle counting, degeneracy, arboricity, graph theory, parameterized accuracyFunding:
Quentin Hillebrand: partially supported by KAKENHI Grant 20H05965, and by JST SPRING Grant Number JPMJSP2108.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Security and privacy Privacy-preserving protocols ; Theory of computation Graph algorithms analysisAcknowledgements:
The authors wish to express their thanks to the anonymous reviewers whose valuable feedback greatly enhanced the quality of this paper.Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim ThắngSeries and Publisher:

1 Introduction
In recent years, differential privacy [13, 15] has become the gold standard for providing strong privacy guarantees while enabling meaningful data analysis. Differential privacy ensures that the output of a computation does not significantly change when any single individual’s data is modified, thus safeguarding individual privacy. While much of the initial work in differential privacy focused on traditional tabular data [14, 26], there is increasing interest in extending these privacy guarantees to graph data [31, 35], which presents its own unique set of challenges.
Differential privacy has evolved into numerous variants to accommodate different scenarios, as detailed in [8]. Of particular interest to us is the concept of local differential privacy [7, 17]. This variant is unique in that it does not rely on the assumption of a trusted central server. Instead, users must obfuscate their private data before sharing it with an untrusted computing entity. In the context of graph data, the most commonly adopted notion is edge local differential privacy [29], where the sensitive information of each user pertains to their connections with others.
A widely used obfuscation method is randomized response [33, 32]. In this approach, users invert each bit of their adjacency vector with a certain probability. The server then collects this distorted information to construct an obfuscated graph. Although it is possible to publish various graph statistics from the obfuscated graph, the resulting information tends to be imprecise. Algorithms specifically designed to publish particular statistics typically yield more accurate and useful graph information.
Upper Bound | Lower Bound | |
Triangle | ([22], general graphs) | (non-interactive) [24] |
([16], general graphs) | (interactive) [16] | |
(this work, degeneracy-bounded graph) | (non-interactive) [16] | |
Odd length | (folklore, general graphs) | |
cycles | (this work, degeneracy-bounded graph) |
One graph statistic frequently considered by researchers in local differential privacy is the number of subgraphs [22, 20]. Specifically, many studies have focused on the publication of triangle counts [22, 23, 20, 16]. Theoretical analysis results on the -error are summarized in Table 1. Unfortunately, to date, when is the number of nodes and is the maximum degree of the input graphs, the best algorithm has an expected -error of or . We believe that this error is too large for many applications and should be improved. On the other hand, it has been shown that for all locally differentially private algorithms, there exists a class of graphs where the -error is [16]. This lower bound implies that the expected -error cannot be significantly improved.
1.1 Our Contribution
This motivates us to consider a specific class of graphs. In this paper, we specifically focus on graphs with bounded degeneracy, as most social graphs exhibit degeneracy values that are substantially smaller than both the number of vertices and the maximum degree. The table in the Appendix of [12] provides statistics on a diverse range of graphs, detailing the number of nodes, degeneracy, and maximum degree. As shown in the paper, for sufficiently large graphs, degeneracy is consistently at least an order of magnitude smaller than the maximum degree, and in some instances, several orders of magnitude smaller.
Additionally, several synthetic graph models commonly considered realistic naturally produce graphs with low degeneracy. Examples include preferential attachment graphs [1] and bounded expansion graphs [27].
Degeneracy is particularly significant in parameterizing the complexity of subgraph counting algorithms, as demonstrated in [6, 2, 5]. Given the relationship between computational complexity and estimation error in triangle counting algorithms, degeneracy is an important parameter for characterizing accuracy.
Let the graph degeneracy be . We propose a locally differentially private algorithm with an expected -error of . When the graph degeneracy is bounded (), the expected -error becomes . This result implies that our expected error for the degeneracy-bounded graphs can be smaller than the lower bound for general graphs.
We also extend our results to count the number of cycles with odd lengths in degeneracy-bounded graphs. To our knowledge, there are only two local differentially private algorithms proposed for counting subgraphs of more than three nodes. The first algorithm [24] is designed to count the number of four-length cycles but operates within the shuffle model, which is weaker than the original local differential privacy model. The second algorithm counts the number of walks of length [3]. This field has limited work due to the significant noise introduced to ensure user privacy, which accumulates as the subgraph size increases. This accumulation results in unacceptable errors for differential privacy in larger subgraphs. For instance, while the expected -error from triangle counting algorithms based on randomized response is [24], the expected -error for similar algorithms estimating the number of is as high as . In other words, the error increases by a factor of with each increment in cycle length.
In this work, we propose an algorithm that significantly reduces the expected -error to in degeneracy-bounded graphs. We believe that this error is much smaller than the actual number of cycles in most graphs. Consequently, our algorithm is the first to publish a meaningful number of large cycles under local differential privacy.
1.2 Technical Overview
In this section, we provide an overview of the technical concepts behind our triangle counting algorithm. The algorithm for counting odd-length cycles, for , extends these ideas but requires a more intricate and detailed analysis.
Let the input graph be . In prior work [22], they apply a randomized response mechanism that flips each bit in the adjacency matrix with a certain probability. Let the resulting graph after applying the randomized response be . In the local differential privacy setting, each node knows whether it is connected to another node (where ) if . For the triangle counting method, node considers as a triangle if , , and . Define if node considers as a triangle, otherwise set . Define . The estimated number of triangles for node , reported by the user, is . The total estimated number of triangles in the graph is then where dividing by three corrects for the fact that each triangle is counted once by each of the three users forming it (i.e., triple-counted), ensuring each triangle is counted only once.
The -error of the estimated triangle count mostly arises from the variance in the estimation. A significant portion of this variance comes from the covariance between pairs of variables in the summation . Two variables, and , are dependent if . The number of dependent pairs in the counting process is equivalent to the number of tuples such that , which corresponds to the number of 4-cycles in the input graph . Therefore, the squared -error is approximately proportional to the number of 4-cycles in the graph, which is .
Let us assume that the indices of all users are predetermined and publicly known before the counting process begins. Define . If node only considers the pairs within , then each triangle is counted exactly once. The estimated number of triangles, , can be calculated as , where . In this counting method, the number of dependent variable pairs is at most the number of 4-cycles that contain the three nodes with .
Let represent the degeneracy of the input graph , and for each , let denote the degree of . Assume that the degrees of all nodes are publicly known, and the nodes are indexed in non-decreasing order of their degree, i.e., if , then . Referring to the bound established by Chiba and Nishizeki [6], which states that , we demonstrate in this paper that the number of such cycles is . Consequently, the squared -error is reduced from in previous work to .
However, we cannot assume that the degrees of all nodes are publicly known, as this information is sensitive. To address this issue, we use local Laplacian queries, allowing each user to publish a noisy version of their degree. Let the noisy degree of be denoted as . We then assign indices to users based on these noisy degrees, such that if , then . Afterward, we run the protocol described in the previous paragraph. We show that even with noisy degrees, the expected number of such cycles remains bounded by .
In summary, our mechanism involves two steps. First, users publish their noisy degrees using the local Laplacian mechanism, and the server assigns indexes based on these noisy values. In the second step, using the results of randomized response, each user estimates the number of triangles where . This method significantly reduces the number of dependent triangle pairs in degeneracy-bounded graphs, which in turn lowers the variance of the estimation.
1.3 Related Works
The field of graph data mining under local differential privacy is relatively new. In contrast, differential privacy has been studied for many years by various researchers, including works like [18, 28]. According to [22], local differential privacy typically only hides edges or relationships, except in special cases like [36]. Differential privacy, on the other hand, can hide whether an individual or node is part of a social network, as shown in [19, 30]. Therefore, while both edge and node differential privacy exist, node differential privacy does not apply in the context of local differential privacy.
Recent works have proposed methods to estimate the densest subgraph, -core decomposition, and degeneracy under local differential privacy [10, 9, 11]. However, since we are focused on estimating different graph statistics in graphs, we do not use or extend the ideas from these works. Instead, the estimation of degeneracy can be used to approximate the -error of our algorithm.
2 Preliminaries
2.1 Notations
For a set of vertices and a set of edges, we denote by the graph on . We consider simple undirected graphs, meaning that for , and . We denote by the size of the graph and its number of edges.
For each , we introduce , the adjacency list of user , where for any , if the edge is in and otherwise. Additionally, we introduce , the degree of node , which corresponds to the number of edges incident to .
We call a path of length , denoted , any tuple such that, for all , , and, for all , . We also use to refer to the number of paths of length in . Similarly, a cycle of length , or , is a tuple that forms a path and satisfies . We will also use to refer to the number of cycles of length in .
2.2 Edge Local Differential Privacy
We say that two adjacency lists and are neighboring if they differ by one bit, i.e. if we can go from one to the other by adding or removing an edge to node . If is a neighbor of , we write that . The notion of edge local differential privacy is as follows:
Definition 1 (-edge local differentially private query).
Let . A randomized algorithm is a -edge local differentially private query on the node if, for all neighboring bit strings , and for all , it holds that
Definition 2 (-edge local differentially private algorithm [29]).
Let be an algorithm that generates multiple randomized queries for each user, has each user apply these queries to their adjacency vector, and then estimates some graph statistics based on the results. We say is an -edge local differentially private algorithm if, for all users and for all possible sets of queries inquired to (where for each , is an -edge local differentially private query), it holds that .
2.3 Laplacian Query and Restricted Sensitivity
Next, we introduce queries that are -edge local differentially private. We first consider a query which aims to give an estimate of a real number statistics of the adjacency vector.
Definition 3 (Edge local Laplacian query [21]).
For a function on adjacency lists, and denoting neighboring adjacency lists, the global sensitivity of is defined as For , the query that outputs is -edge local differentially private, where represents noise drawn from the Laplacian distribution with parameter .
Global sensitivity in Definition 3 is designed to handle the worst-case scenario, which can lead to large amounts of noise being added to the data when using the Laplacian mechanism. However, if the data is known to belong to a specific set, restricted sensitivity allows us to adjust the noise according to the sensitivity within that set, resulting in more tailored and potentially lower noise levels.
Definition 4 (Restricted sensitivity (Definition 8 of [4])).
Let and be the Hamming distance between and . The restricted sensitivity of over a set of possible output is
We can use restricted sensitivity to publish data even if it is not initially in the set. To do this, we first need to define a projection method to map the data to the set. In this work, we will consider , the class of adjacency list with a maximum degree of , for calculating restricted sensitivity. We assume that the order of all nodes is fixed, and if a node is adjacent to more than nodes, we retain only the first nodes according to this order. The map can be considered as an operation on each adjacency vector . We denote the mapping result on as .
Definition 5 (Edge local Laplacian query with restricted sensitivity on [4]).
For any queried to a user , the query that answers is called edge local Laplacian query with restricted sensitivity on , and provides -edge local differential privacy.
2.4 Unbiased Randomized Response
In this subsection, we consider the randomized response query, which aims to publish an obfuscated adjacency vector.
Definition 6 (Randomized response query [33, 32]).
For , the randomized response mechanism takes an adjacency list as input and outputs an obfuscated list . For , the probability that is set to 1 is given by:
With this definition, randomized response provides -edge local differential privacy.
We can construct a graph based on the collection of obfuscated adjacency vectors obtained from all users. Using the statistics of the obfuscated graph , we can then publish various information, including the number of subgraphs [34, 22, 23, 20]. However, randomized response produces biased results, making it less suitable for counting queries. This bias can be fixed by the subsequent definition.
Definition 7 (Unbiased randomized response query [16]).
Let and be the adjacency vector published through randomized response with budget by user . Then, for all ,
is an unbiased estimator of . Additionally, for , is independent of , and . We refer to a query that publishes as the unbiased randomized response query. This query is -edge locally differentially private.
We can use the results from the unbiased randomized response query to calculate the number of subgraphs. For example, without privacy constraints, the number of triangles can be calculated as . To privately estimate the number of triangles, we use . It is theoretically shown in [16] that the estimator has a smaller -error compared to the estimator obtained from the randomized response query, .
2.5 Graph Arboricity and Degeneracy
Graph arboricity and degeneracy can be defined as follows:
Definition 8 (Arboricity).
The arboricity of a graph is the minimal number such that the edges of can be partitioned into forests.
Definition 9 (Degeneracy).
The degeneracy of a graph is the smallest number such that any subgraph of , contains at least one node with induced degree at most .
We observe that the variable is frequently used as a privacy parameter in differential privacy. However, since we do not consider that parameter in this paper, we choose to use to represent degeneracy, which is also a common convention. When the context is clear, we will drop the of the notation and simply write and . The two quantities are linked by the following theorem.
Theorem 10 (equation 3 and lemma 2.2 of [37]).
In any graph , degeneracy and arboricity satisfy
The arboricity has previously been used outside of the differential private community to bound some graph statistics. A folklore useful result is that the number of edges in a graph is smaller than . Another well known result is as follows:
Theorem 11 (Chiba-Nishizeki Bound [6]).
With and the degree of node , then
3 Node-Reordered Graphs and Their Properties
The first step of our mechanism is to order the vertices based on their estimated degree. The algorithm for this step is shown in Algorithm 1. At Line 2 of the algorithm, we privately publish the estimated degree. Under edge local differential privacy, the global sensitivity of the degree is 1. Therefore, we can use the Laplacian query (Definition 3) with noise scaled to to publish the degree, where is the privacy budget allocated to this step. We denote the estimated degree as .
After publishing the estimated degrees, in Line 3, we assign an order to the nodes based on their degrees, which we refer to as a low degree ordering. For , we denote the reordered graph as , where and for all . The edge set is defined as . We note that and are isomorphic, and thus have the same number of subgraphs. We denote by the degree of in and the number of neighbors of node in the set .
Symbol | ||||
---|---|---|---|---|
Representation | ||||
Bound | [25] |
In the remainder of this section, we analyze the properties of graphs produced by the reordering. Specifically, our focus is on bounding the frequency of certain substructures within the reordered graph. A summary of the results from this section is provided in Table 2.
Definition 12 (low star).
For , a low--star is a subgraph consisting of a central node and neighboring nodes, where at least one of the neighboring nodes has an index smaller than that of the central node. We denote by the number of such subgraphs contained in a graph .
Theorem 13.
.
Proof.
Let be the set of neighbors of in . We have that:
Let denote the noise added to the estimated degree of user . For each edge , their ranks can only be exchanged if the sum of the errors in both degree estimations exceeds the gap between the two degrees. Therefore, the quantity satisfies
Using this inequality, we can rewrite the count of as
Since is sampled from , we have that follows an exponential law of expectation . Hence,
Since is isomorphic to , and using Theorem 11 it follows that
Since and , this gives .
In addition to the ordered stars we just discussed, arboricity can also be used to bound the number of paths and cycles in a graph, as demonstrated in the following lemma and theorem. Recall that is the number of paths with length in the graph .
Lemma 14.
For any positive integer , .
Proof.
We first consider . Let be a function that maps a path of length to a tuple of edges, defined as . We observe that, for any tuple of edges denoted by , is either a set containing one path or an empty set. There is at most one path that uses as the -th edge of the path for all . Thus, we can conclude that the number of paths of length is at most the number of sets of edges, which is .
Next, let us consider . Let be a function that maps a path of length to a tuple of edges, defined as . We observe that, for any tuple of edges denoted by , is a set of size no larger than . There is at most one path of length that uses as the -th edge of the path, and there are at most possible ways to extend a path of length to a path of length . Hence, .
Recall that is the number of cycles with size in the graph . We obtain the following theorem.
Theorem 15.
For any , .
Proof.
Let us denote the number of paths of length that have node as an extremity and the number of cycles of length containing edge . Using these notations, we have . Consider the number . For a path of length that has a node as a terminal, there is at most one cycle of length which includes this path and the edge . Therefore, we conclude that . Similarly, we have . Hence,
For any function such that for all , is equal to either or , . By definition of the arboricity, there exists a set of disjoint forests such that . By choosing a root for each tree of these forests, we can introduce a function such that each edge has its child node as an image. In this way, each node can only be the image of one edge per forest. This leads to
The last step is justified by the fact that each path having two extremities, the sum of all the paths of length starting with node is twice the number of paths of length .
Combining Lemma 14 and Theorem 15, we obtain the following corollary111We note that this result was independently established in [25] by a different proof. We are grateful to the anonymous reviewer for bringing this to our attention..
Corollary 16.
For , and .
Next, we focus on the number of cycles of length for any , in which three consecutive vertices of the cycle exhibit monotonic ranks , as illustrated in Table 2. Throughout the rest of this article, we will denote the count of such subgraphs in by , omitting from the notation when the context is clear. In the following theorem, for simplicity, we extend the notation by assuming and for every graph .
Theorem 17.
For , .
Proof.
Let represent the number of subgraphs in where three consecutive vertices exhibit monotonic ranks, with being the edge immediately following these consecutive vertices. Also, for , let the number of paths of length with a low-2-star as one of its extremities be denoted as . Since we can construct at most one path included in where a low-2-star and a path of length are its extremities, we obtain the inequality .
Let be a cycle which is counted in . Consider the path in of length starting from that does not pass through and the other path in of the same length starting from that does not pass through . We observe that one extremity of the two paths is a low-2-star. Hence, when is the number of paths in the count of that have as an extremity. Using the same definition of as in the proof of Theorem 15, we have
Corollary 18.
For , .
The next corollary considers the number of edge sets in with specific properties.
Corollary 19.
For any , we consider edge sets of size such that 1) for some , there exists a set of cycles in where and for , and 2) at least one of contains three consecutive vertices of monotonic index. The number of such edge sets is .
Proof.
Consider a partition of , denoted by , where . The number of such partitions is a function of and can be considered constant. We will demonstrate that the number of cycle sets satisfying the conditions in the corollary statement, with , is at most . Therefore, the number of cycle sets satisfying the corollary statement is no more than .
To prove the bound, we will consider two cases: either all the cycles have even lengths, or at least two of them have odd lengths, given that the total number of edges is even.
If all the cycles are of even length, then, for some one of them is of length and includes 3 consecutive vertices of monotonic index. By Corollary 18, there are possibilities for this cycle. For the remaining cycles, Corollary 16 tells us that the number of admissible configurations is bounded by . In total, this gives a bound. If at least two cycles have odd lengths, say and , then by Corollary 16, the number of possible configurations for these cycles can be bounded by for the first cycle and for the second cycle, and for the remaining cycles. Overall, this results in a bound of .
4 Triangle Counting Algorithm
We propose Algorithm 2 to count the number of triangles based on the ordering and properties discussed in the previous section. First, we execute Algorithm 1 at Line 2. Next, at Line 3, we use the randomized response query to obtain an obfuscated graph. From Lines 4 to 8, we employ the Laplacian query with restricted sensitivity on (Definition 5) to estimate the number of triangles associated with User . Finally, at Line 9, we sum all the estimates and report the total as the estimated triangle count. We adopt the concept from [22] of distributing randomized response results to all nodes and having each node estimate its number of triangles. However, the other algorithmic ideas presented in this work are novel. In the following theorem, we demonstrate that our algorithm is differentially private.
Theorem 20.
Algorithm 2 provides -edge local differential privacy.
Proof.
For all possible executions of Algorithm 2, it inquires three queries to all users. They are 1) the Laplacian query with privacy budget inside the GetOrderingfunction at Line 2, 2) the unbiased randomized response query with privacy budget at Line 3, and 3) the Laplacian query with restricted sensitivity on at Lines 4-8.
To prove this theorem, we only need to show that the query at Lines 4-8 is -edge local differentially private. The query aims to publish . By the unbiased randomized response in Line 3, we have that, for any , . It can be shown that, for (defined in Definition 5) such that , the number of different elements in the set obtained from , at Line 6 is at most . Therefore, the restricted sensitivity of the function (denoted by in Definition 5) is not larger than . Hence, by Definition 5, the publication of at Line 8 is -edge local differentially private.
We now discuss the accuracy of our estimation and its relation with the parameter appearing at Line 4 the algorithm. We will see that controls the trade off between the bias and the accuracy. The smaller is, the smaller the average noise gets, but the larger the probability of bias and its expected magnitude is.
In the following lemma, we discuss that the projection applied at Line 5 changes the adjacency vector only with small probability.
Lemma 21.
For any , with probability at least , for all .
Proof.
Using the cumulative distribution function of the Laplacian random variable, we have . Thus, by taking this inequality for all , and using the union bound, we obtain .
We show that our estimation has no bias with high probability in the subsequent theorem.
Theorem 22.
With probability at least , algorithm 2 provides an unbiased estimate of the number of triangles in the graph, i.e.
Proof.
As discussed in Definition 7, we have that . Using Lemma 21, with probability at least , is larger than for all , and the function has no effect. Consequently, precisely represents the set of forks centered on node , encompassing all possible triangles. Therefore, is an unbiased estimate of the number of triangles such that . Given that Laplace noise is centered and triangles can be decomposed accordingly, is an unbiased estimation of .
Corollary 23 ensures that even in the unlikely event of some clipping occurring, the resulting bias would still represent only a small fraction of the actual count.
Corollary 23.
The expected value of the bias of Algorithm 2 is bounded by .
Proof.
When the corrected estimated degree is smaller than the actual degree , edges are excluded. This exclusion introduces a bias because the potential triangles involving these excluded edges are not counted. For each user and their neighbor , let denote the number of triangles counted by user that involve the edge . We also define . Then, the maximum bias resulting from a single clipped edge can be bounded by .
The expected number of clipped edges for user is determined by evaluating the following integral, where serves as the correction term for the degree:
We obtain the final result by combining these elements and observing that .
The accuracy of our estimation is demonstrated in the subsequent theorem.
Theorem 24.
When , the squared expected -error of algorithm 2 is bounded by
Proof.
The squared -error can be decomposed into the square of the bias plus the variance. We have established in Corollary 23 that the bias of the algorithm is bounded by . We will now focus on bounding the variance of the algorithm. This variance arises from two distinct sources: the randomized response query and the Laplacian query with restrictive sensitivity.
Regarding the noise introduced by the Laplacian query with restrictive sensitivity, its variance is simply the sum of the variances of each term, which is
Next, we consider the variance from the randomized response query. In the following equations, we use the notation to denote the set of neighbors of both and such that . Note that by including one node from along with and , a triple in is formed. Similarly, including two nodes from along with and results in a quadruplet in . We also notice from Definition 7 that, for , is independent to and . Hence,
In the previous work [16], the number of terms in the variance calculation is bounded by the number of cycles of length four, which is . We reduce that number to using the GetOrderingfunction in Line 2 and by including only pairs such that . It is known that and, in many practical graphs, the degeneracy is much smaller than the maximum degree.
5 Odd Length Cycle Counting
In this section, we will describe how to utilize low-degree ordering to accurately count odd-length cycles in graphs with bounded degeneracy. Some concepts are extended from the previous section. As shown in Algorithm 3, the algorithm for estimating the number of odd-length cycles is similar to Algorithm 2, except that the restricted sensitivity at Line 9 is larger, and at Line 8, we replace with an estimate for the number of paths under specific constraints. We discuss the privacy of the algorithm in the subsequent theorem. The main challenge of the proof is to demonstrate that the Laplacian query under restricted sensitivity at Lines 5-9 is -differentially private.
Theorem 25.
Algorithm 3 provides -edge local differential privacy.
Proof.
We need to demonstrate that Lines 5-9 of the algorithm, involving the Laplacian query with restricted sensitivity on , ensure -differential privacy. Following the arguments of Theorem 20, we assert that altering entries of changes the set by at most elements. A single element change in can alter the value of by
Therefore, the restricted sensitivity of is . Consequently, the publication of at Line 9 is -differentially private.
The bias of the algorithm is given in the following theorem.
Theorem 26.
With a probability of at least , Algorithm 3 provides an unbiased estimate of the number of -cycles in any graph for any odd integer .
Proof.
Since we publish using the unbiased randomized response query, the publication is an unbiased estimation of . Furthermore, as those estimators are independent from one another, for each and , is an unbiased estimate of . It results from this that is an unbiased estimator of the number of paths between and with length such that, for any three consecutive nodes with monotonic ranks, the node has a lower rank that . We denote the number of such paths as .
Let us introduce . Assuming no clipping occurs, which happens with a probability of at least , we have by linearity of expectation that both and are unbiased estimators of . Therefore, all that remains to be proven is that the number of -cycles in is equal to . It is evident that for each element counted in , there is a corresponding cycle in and, also, in .
Conversely, consider a cycle of length in . Since it is also a cycle in , we can represent it in as . Because the cycle is of odd length, there exist three consecutive nodes with a monotonic rank. Among all possible triplets, consider the one where the central node has the smallest rank, denoted as with . Furthermore, let and , and assign the indices of the other nodes in the cycle to through in the order they appear in the cycle. Thus, the cycle is counted in . Furthermore, if any other node in the cycle were chosen as , the remaining path would not be part of . This ensures that each cycle is counted exactly once in .
Finally, the -error of Algorithm 3 is proven in the next theorem. The most challenging aspect of this theorem is to bound the covariance in the summation at Lines 8 and 10. We assert that any two dependent elements of can be considered as a set containing an even number of edges which forms multiple disjoint cycles with specific properties. Consequently, we can utilize our results from Corollary 19 to bound the number of such pairs. The proof of the theorem is given in the appendix of this paper.
Theorem 27.
When , the expected squared -error of algorithm 3 is bounded by
Before proving Theorem 27, we demonstrate the following lemma.
Lemma 28.
The expected value of the bias of Algorithm 3 is bounded by .
Proof.
We have already seen the proof of Corollary 23 that the expected value of the number of clipped edges for user was bounded by . We now have to bound the bias created by one edge removal, i.e. the maximal number of cycles one edge can part of.
With the number of cycles counted by that involve edge , the maximal bias for user is bounded by , and the bias of the algorithm by .
Now, we are ready to prove Theorem 27.
Proof of Theorem 27.
The squared -error can be decomposed into the square of the bias plus the variance. In Lemma 28, we established that the bias of the algorithm is bounded by . We will now focus on bounding the variance of the algorithm.
Let the indicator variable be 1 if the path exists in , and 0 otherwise. We also denote the random variable by . Finally, we define . This random variable has the properties that and .
Similar to the case with triangles, the variance of Algorithm 3 arises from both the unbiased randomized response query and the Laplacian query with restricted sensitivity.
Concerning the variance term coming from the randomized response, we have to compute the variance of
We have to take into account the term that comes from the sum of the variances of the as well as the one coming from the covariances between them.
To compute the sum of variances, we start with:
Additionally, for each and , the cardinality of is bounded by , and the number of ways to choose is bounded by , which is by Theorem 13. This contributes a term in the variance from the sum of variances bounded by .
To analyze the term arising from the covariances, we first examine the covariance between and . In the following equations, let be the set of edges that appear only in or , and let be the set of edges that appear in both. Recall that, for any and .
We observe that the covariance between and is zero if the paths and do not share at least one common edge or if the edges present in only one of the paths are not present in the original graph. Now consider the situation where the covariance is non-zero. We have that . Additionally, we will denote the node responsible for counting this instance of and the one responsible for .
Let , and let
In other words, the set consists of the edges in the paths and , along with the additional edges , , , and . Additionally, let
Similarly, let
In other words, the sets and are the sets and extended to include the additional edges , , , and .
We introduce the difference between the cardinal of and , . Let be the cardinality of . In this case, the covariance is . We have that , which gives .
In the next step, we will calculate the number of the pairs of paths with . Let us consider the degree of each node in . It is clear that the degrees are neither greater than four nor less than two. A node has a degree of three only if one of the three edges incident to it belongs to and the other two to . A node has a degree of four if all four edges incident to it are in , and it has a degree of two if both edges incident to it are either in or in . Hence, if we consider the graph , we have a graph of degree two or four, which is a union of multiple disjoint cycles.
Let the number of those disjoint cycles be and the size of those cycles be . We have that , i.e. is a partition of . We know that the number of such partitions is bounded by a function of . Let suppose that the bound is .
Let us give the number of with cycle size . We can use Corollary 16 to show that the number of such sets is . When , we know that and are in . There are three consecutive nodes with monotonic ranks in the union of disjoint cycles . Hence, we can use Corollary 19 to show that the number of such sets is bounded by . By combining the two cases, we can conclude that the number of possible sets with cycle size is at most . The number of possible is then . As is a constant, the number is .
We then consider the number of configurations for , which consists of a union of disjoint paths. Let the number of paths be and their lengths be . We have that , and forms a partition of . The number of possible partitions is bounded by a function of , denoted as . Each part must begin and end in the node set , where . Therefore, the number of possible paths is at most , and the number of possible sets with the partition is at most . Hence, the total number of possible sets is .
Consequently, for each set , the number of possible configurations for is at most . The number of pairs of paths and with is then at most . Each of these pairs contributes to the covariance sum.
The covariance of can then be calculated as follows:
Since this bound is larger than the one for the sum of variances, we can disregard the latter.
To compute the variance resulting from the Laplacian query with restricted sensitivity, we sum the variance of the Laplacian distribution for all nodes:
(1) |
Let us now consider the expected value
(2) |
By Lemma 14 and the fact that is an odd number, we have . The variance can be decomposed into the sum of the variances of each path, which is bounded by , and the sum of covariances.
The covariance is non-zero only if at least two edges are shared between the two paths and all edges that appear only once exist in the original graph. As previously discussed, this forms a cycle structure, except for the path extremities that do not need to be connected. Recall the definitions of the sets and from the previous paragraph.
The set consists of two paths at the extremities and multiple disjoint cycles. Suppose the number of edges in is , the number of edges in the two paths are and , and the number of disjoint cycles is , with the number of edges in these cycles being . This gives us . In other words, forms a partition of . The number of such partitions is bounded by a function of . Let the bound be .
We now discuss the number of possible configurations of for the partition . From Lemma 14 and Corollary 16, the number of cycles of length is bounded by , and the number of paths of length is bounded by . Thus, the number of configurations for the partition is:
Hence, the number of possible configurations for with edges is no more than .
The number of edges in is . Using the previous argument when calculating the number of possible set , we obtain that the number of configurations for is . The number of configurations with is then . Hence, the overall number of combinations is .
From the previous paragraph, we observe that the covariance term outweighs the sum of the variances, leading to . Additionally, when calculating in (2), it is evident that dominates , resulting in . Substituting with in (1), we find that the variance from the Laplacian mechanism is bounded by
We obtain the theorem result by summing the variance from the unbiased randomized response query and the variance from the Laplacian query with restricted sensitivity.
6 Conclusion
In this work, we introduced a private vertex ordering algorithm. The transformation on the graph induced by this ordering reduces the count of specific order-sensitive motifs while preserving the overall graph structure. Due to its reliance on the Laplacian mechanism, the algorithm performs well even in high-privacy settings, making it an excellent preprocessing step for subgraph counting queries.
Within this framework, we first propose a new triangle counting algorithm whose accuracy depends on the count of specific ordered subgraphs. By combining this algorithm with the ordering preprocessing step, we achieve an expected error of for graphs with bounded degeneracy, compared to the error seen in the current state of the art.
Subsequently, we extended the algorithm to address the more general case of odd-length cycle counting. We propose the first purely local differentially private counting algorithm specifically designed for cycles longer than triangles. Under the assumption of bounded degeneracy, the algorithm achieves an error of for cycles of length .
Due to the constraints of local differential privacy, it might be assumed that the range of tasks we can perform on graphs under this privacy notion is limited. However, in this work, we demonstrate that more precise information can be published under local differential privacy by restricting our inputs to certain types of graphs. We believe that parameterized algorithms under local differential privacy represent an intriguing research area that can contribute significantly to both algorithm design and information privacy.
One limitation of this method is that the relative error can become significantly large when the number of cycles is small (or even zero), even in cases where the graph’s degeneracy – and consequently the -error of our algorithm – is high. Identifying a class of graphs for which an algorithm with bounded relative error can be designed would be a direction for future research. Another question for future investigation is determining lower bounds for degeneracy-bounded graphs under the local differential privacy.
References
- [1] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- [2] Suman K. Bera, Lior Gishboliner, Yevgeny Levanzov, C. Seshadhri, and Asaf Shapira. Counting subgraphs in degenerate graphs. ACM Journal of the ACM (JACM), 69(3):23:1–23:21, 2022. doi:10.1145/3520240.
- [3] Louis Betzer, Vorapong Suppakitpaisarn, and Quentin Hillebrand. Publishing number of walks and katz centrality under local differential privacy. In The 40th Conference on Uncertainty in Artificial Intelligence, 2024. URL: https://openreview.net/forum?id=76UkTmdmkB.
- [4] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In Innovations in Theoretical Computer Science, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013, pages 87–96. ACM, 2013. doi:10.1145/2422436.2422449.
- [5] Marco Bressan. Faster algorithms for counting subgraphs in sparse graphs. Algorithmica, 83(8):2578–2605, 2021. doi:10.1007/s00453-021-00811-0.
- [6] Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210–223, 1985. doi:10.1137/0214017.
- [7] Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1655–1658. ACM, 2018. doi:10.1145/3183713.3197390.
- [8] Damien Desfontaines and Balázs Pejó. Sok: Differential privacies. Proc. Priv. Enhancing Technol., 2020(2):288–313, 2020. doi:10.2478/popets-2020-0028.
- [9] Laxman Dhulipala, George Z. Li, and Quanquan C. Liu. Near-optimal differentially private k-core decomposition. CoRR, abs/2312.07706, 2023. doi:10.48550/arXiv.2312.07706.
- [10] Laxman Dhulipala, Quanquan C. Liu, Sofya Raskhodnikova, Jessica Shi, Julian Shun, and Shangdi Yu. Differential privacy from locally adjustable graph algorithms: k-core decomposition, low out-degree ordering, and densest subgraphs. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 754–765. IEEE, 2022. doi:10.1109/FOCS54457.2022.00077.
- [11] Michael Dinitz, Satyen Kale, Silvio Lattanzi, and Sergei Vassilvitskii. Improved differentially private densest subgraph: Local and purely additive. CoRR, abs/2308.10316, 2023. doi:10.48550/arXiv.2308.10316.
- [12] Pål Grønås Drange, Patrick Greaves, Irene Muzi, and Felix Reidl. Computing complexity measures of degenerate graphs. In 18th International Symposium on Parameterized and Exact Computation, IPEC 2023, September 6-8, 2023, Amsterdam, The Netherlands, volume 285 of LIPIcs, pages 14:1–14:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.IPEC.2023.14.
- [13] Cynthia Dwork. Differential privacy. In Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II, volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer, 2006. doi:10.1007/11787006_1.
- [14] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pages 265–284. Springer, 2006. doi:10.1007/11681878_14.
- [15] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi:10.1561/0400000042.
- [16] Talya Eden, Quanquan C. Liu, Sofya Raskhodnikova, and Adam D. Smith. Triangle counting with local edge differential privacy. In 50th International Colloquium on Automata, Languages, and Programming, ICALP 2023, July 10-14, 2023, Paderborn, Germany, volume 261 of LIPIcs, pages 52:1–52:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ICALP.2023.52.
- [17] Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pages 211–222. ACM, 2003. doi:10.1145/773153.773174.
- [18] Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1106–1125. SIAM, 2010. doi:10.1137/1.9781611973075.90.
- [19] Michael Hay, Chao Li, Gerome Miklau, and David D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009, pages 169–178. IEEE Computer Society, 2009. doi:10.1109/ICDM.2009.11.
- [20] Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya. Communication cost reduction for subgraph counting under local differential privacy via hash functions. CoRR, abs/2312.07055, 2023. doi:10.48550/arXiv.2312.07055.
- [21] Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya. Unbiased locally private estimator for polynomials of laplacian variables. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, pages 741–751. ACM, 2023. doi:10.1145/3580305.3599537.
- [22] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Locally differentially private analysis of graph statistics. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pages 983–1000. USENIX Association, 2021. URL: https://www.usenix.org/conference/usenixsecurity21/presentation/imola.
- [23] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Communication-efficient triangle counting under local differential privacy. In 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pages 537–554. USENIX Association, 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/imola.
- [24] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Differentially private triangle and 4-cycle counting in the shuffle model. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, pages 1505–1519. ACM, 2022. doi:10.1145/3548606.3560659.
- [25] George Manoussakis. Listing all fixed-length simple cycles in sparse graphs in optimal time. In Fundamentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux, France, September 11-13, 2017, Proceedings, volume 10472 of Lecture Notes in Computer Science, pages 355–366. Springer, Springer, 2017. doi:10.1007/978-3-662-55751-8_28.
- [26] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 94–103. IEEE Computer Society, 2007. doi:10.1109/FOCS.2007.41.
- [27] Jaroslav Nesetril and Patrice Ossona de Mendez. Sparsity - Graphs, Structures, and Algorithms, volume 28 of Algorithms and combinatorics. Springer, 2012. doi:10.1007/978-3-642-27875-4.
- [28] Iyiola E. Olatunji, Thorben Funke, and Megha Khosla. Releasing graph neural networks with differential privacy guarantees. Trans. Mach. Learn. Res., 2023, 2023. URL: https://openreview.net/forum?id=wk8oXR0kFA.
- [29] Zhan Qin, Ting Yu, Yin Yang, Issa Khalil, Xiaokui Xiao, and Kui Ren. Generating synthetic decentralized social graphs with local differential privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 425–438. ACM, 2017. doi:10.1145/3133956.3134086.
- [30] Sofya Raskhodnikova and Adam D. Smith. Differentially private analysis of graphs. In Encyclopedia of Algorithms, pages 543–547. Springer, 2016. doi:10.1007/978-1-4939-2864-4_549.
- [31] Sina Sajadmanesh and Daniel Gatica-Perez. Locally private graph neural networks. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, pages 2130–2145. ACM, 2021. doi:10.1145/3460120.3484565.
- [32] Yue Wang, Xintao Wu, and Donghui Hu. Using randomized response for differential privacy preserving data collection. In Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, Bordeaux, France, March 15, 2016, volume 1558 of CEUR Workshop Proceedings. CEUR-WS.org, 2016. URL: https://ceur-ws.org/Vol-1558/paper35.pdf.
- [33] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- [34] Qingqing Ye, Haibo Hu, Man Ho Au, Xiaofeng Meng, and Xiaokui Xiao. Towards locally differentially private generic graph metric estimation. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, pages 1922–1925. IEEE, 2020. doi:10.1109/ICDE48307.2020.00204.
- [35] Qingqing Ye, Haibo Hu, Man Ho Au, Xiaofeng Meng, and Xiaokui Xiao. LF-GDPR: A framework for estimating graph metrics with local differential privacy. IEEE Trans. Knowl. Data Eng., 34(10):4905–4920, 2022. doi:10.1109/TKDE.2020.3047124.
- [36] Hailong Zhang, Sufian Latif, Raef Bassily, and Atanas Rountev. Differentially-private control-flow node coverage for software usage analysis. In 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, pages 1021–1038. USENIX Association, 2020. URL: https://www.usenix.org/conference/usenixsecurity20/presentation/zhang-hailong.
- [37] Xiao Zhou and Takao Nishizeki. Graph coloring algorithms. IEICE Transactions on Information and Systems, 83(3):407–417, 2000.