Abstract 1 Introduction 2 Preliminaries 3 Node-Reordered Graphs and Their Properties 4 Triangle Counting Algorithm 5 Odd Length Cycle Counting 6 Conclusion References

Cycle Counting Under Local Differential Privacy for Degeneracy-Bounded Graphs

Quentin Hillebrand ORCID The University of Tokyo, Japan Vorapong Suppakitpaisarn ORCID The University of Tokyo, Japan Tetsuo Shibuya ORCID The University of Tokyo, Japan
Abstract

We propose an algorithm for counting the number of cycles under local differential privacy for degeneracy-bounded input graphs. Numerous studies have focused on counting the number of triangles under the privacy notion, demonstrating that the expected 2-error of these algorithms is Ω(n1.5), where n is the number of nodes in the graph. When parameterized by the number of cycles of length four (C4), the best existing triangle counting algorithm has an error of O(n1.5+C4)=O(n2). In this paper, we introduce an algorithm with an expected 2-error of O(δ1.5n0.5+δ0.5dmax0.5n0.5), where δ is the degeneracy and dmax is the maximum degree of the graph. For degeneracy-bounded graphs (δΘ(1)) commonly found in practical social networks, our algorithm achieves an expected 2-error of O(dmax0.5n0.5)=O(n). Our algorithm’s core idea is a precise count of triangles following a preprocessing step that approximately sorts the degree of all nodes. This approach can be extended to approximate the number of cycles of length k, maintaining a similar 2-error, namely O(δ(k2)/2dmax0.5n(k2)/2+δk/2n(k2)/2) or O(dmax0.5n(k2)/2)=O(n(k1)/2) for degeneracy-bounded graphs.

Keywords and phrases:
Differential privacy, triangle counting, degeneracy, arboricity, graph theory, parameterized accuracy
Funding:
Quentin Hillebrand: partially supported by KAKENHI Grant 20H05965, and by JST SPRING Grant Number JPMJSP2108.
Vorapong Suppakitpaisarn: partially supported by KAKENHI Grant 21H05845 and 23H04377.
Tetsuo Shibuya: partially supported by KAKENHI Grant 20H05967, 21H05052, and 23H03345.
Copyright and License:
[Uncaptioned image] © Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy Privacy-preserving protocols
; Theory of computation Graph algorithms analysis
Acknowledgements:
The authors wish to express their thanks to the anonymous reviewers whose valuable feedback greatly enhanced the quality of this paper.
Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim Thắng

1 Introduction

In recent years, differential privacy [13, 15] has become the gold standard for providing strong privacy guarantees while enabling meaningful data analysis. Differential privacy ensures that the output of a computation does not significantly change when any single individual’s data is modified, thus safeguarding individual privacy. While much of the initial work in differential privacy focused on traditional tabular data [14, 26], there is increasing interest in extending these privacy guarantees to graph data [31, 35], which presents its own unique set of challenges.

Differential privacy has evolved into numerous variants to accommodate different scenarios, as detailed in [8]. Of particular interest to us is the concept of local differential privacy [7, 17]. This variant is unique in that it does not rely on the assumption of a trusted central server. Instead, users must obfuscate their private data before sharing it with an untrusted computing entity. In the context of graph data, the most commonly adopted notion is edge local differential privacy [29], where the sensitive information of each user pertains to their connections with others.

A widely used obfuscation method is randomized response [33, 32]. In this approach, users invert each bit of their adjacency vector with a certain probability. The server then collects this distorted information to construct an obfuscated graph. Although it is possible to publish various graph statistics from the obfuscated graph, the resulting information tends to be imprecise. Algorithms specifically designed to publish particular statistics typically yield more accurate and useful graph information.

Table 1: Upper and lower bounds of the expected 2-error for triangle and k-cycle counting under the local differential privacy. “Interactive” refers to a scenario in which multiple rounds of communication between the server and the clients are permitted.
Upper Bound Lower Bound
Triangle O(n2) ([22], general graphs) Ω(n1.5) (non-interactive) [24]
O(dmax1.5n0.5) ([16], general graphs) Ω(n1.5) (interactive) [16]
O(dmax0.5n0.5) (this work, degeneracy-bounded graph) Ω(n2) (non-interactive) [16]
Odd length O(nk1) (folklore, general graphs)
cycles Ck O(n(k1)/2) (this work, degeneracy-bounded graph)

One graph statistic frequently considered by researchers in local differential privacy is the number of subgraphs [22, 20]. Specifically, many studies have focused on the publication of triangle counts [22, 23, 20, 16]. Theoretical analysis results on the 2-error are summarized in Table 1. Unfortunately, to date, when n is the number of nodes and dmax is the maximum degree of the input graphs, the best algorithm has an expected 2-error of O(n2) or O(dmax1.5n0.5). We believe that this error is too large for many applications and should be improved. On the other hand, it has been shown that for all locally differentially private algorithms, there exists a class of graphs where the 2-error is Ω(n1.5) [16]. This lower bound implies that the expected 2-error cannot be significantly improved.

1.1 Our Contribution

This motivates us to consider a specific class of graphs. In this paper, we specifically focus on graphs with bounded degeneracy, as most social graphs exhibit degeneracy values that are substantially smaller than both the number of vertices and the maximum degree. The table in the Appendix of [12] provides statistics on a diverse range of graphs, detailing the number of nodes, degeneracy, and maximum degree. As shown in the paper, for sufficiently large graphs, degeneracy is consistently at least an order of magnitude smaller than the maximum degree, and in some instances, several orders of magnitude smaller.

Additionally, several synthetic graph models commonly considered realistic naturally produce graphs with low degeneracy. Examples include preferential attachment graphs [1] and bounded expansion graphs [27].

Degeneracy is particularly significant in parameterizing the complexity of subgraph counting algorithms, as demonstrated in [6, 2, 5]. Given the relationship between computational complexity and estimation error in triangle counting algorithms, degeneracy is an important parameter for characterizing accuracy.

Let the graph degeneracy be δ. We propose a locally differentially private algorithm with an expected 2-error of O(δ1.5n0.5+δ0.5dmax0.5n0.5). When the graph degeneracy is bounded (δ=O(1)), the expected 2-error becomes O(dmax0.5n0.5)=O(n). This result implies that our expected error for the degeneracy-bounded graphs can be smaller than the lower bound for general graphs.

We also extend our results to count the number of cycles with odd lengths in degeneracy-bounded graphs. To our knowledge, there are only two local differentially private algorithms proposed for counting subgraphs of more than three nodes. The first algorithm [24] is designed to count the number of four-length cycles but operates within the shuffle model, which is weaker than the original local differential privacy model. The second algorithm counts the number of walks of length k [3]. This field has limited work due to the significant noise introduced to ensure user privacy, which accumulates as the subgraph size increases. This accumulation results in unacceptable errors for differential privacy in larger subgraphs. For instance, while the expected 2-error from triangle counting algorithms based on randomized response is O(n2) [24], the expected 2-error for similar algorithms estimating the number of Ck is as high as O(nk1). In other words, the error increases by a factor of n with each increment in cycle length.

In this work, we propose an algorithm that significantly reduces the expected 2-error to O(n(k1)/2) in degeneracy-bounded graphs. We believe that this error is much smaller than the actual number of cycles in most graphs. Consequently, our algorithm is the first to publish a meaningful number of large cycles under local differential privacy.

1.2 Technical Overview

In this section, we provide an overview of the technical concepts behind our triangle counting algorithm. The algorithm for counting odd-length cycles, for k5, extends these ideas but requires a more intricate and detailed analysis.

Let the input graph be G=(V={ν1,,νn},E). In prior work [22], they apply a randomized response mechanism that flips each bit in the adjacency matrix with a certain probability. Let the resulting graph after applying the randomized response be G=(V,E). In the local differential privacy setting, each node νi knows whether it is connected to another node νj (where νjνi) if {νi,νj}E. For the triangle counting method, node νi considers (νi,νj,νκ) as a triangle if {νi,νj}E, {νi,νκ}E, and {νj,νκ}E. Define ei,j,κ=1 if node νi considers (νi,νj,νκ) as a triangle, otherwise set ei,j,κ=0. Define Si={(j,κ):{νi,νj},{νi,νκ}E and j<κ}. The estimated number of triangles for node νi, reported by the user, is t~i=(j,κ)Siei,j,κ. The total estimated number of triangles in the graph is then f~Δ(G)=13it~i=13i(j,κ)Siei,j,κ, where dividing by three corrects for the fact that each triangle is counted once by each of the three users forming it (i.e., triple-counted), ensuring each triangle is counted only once.

The 2-error of the estimated triangle count f~Δ(G) mostly arises from the variance in the estimation. A significant portion of this variance comes from the covariance between pairs of variables in the summation 13i(j,κ)Siei,j,κ. Two variables, ei,j,κ and ei,j,κ, are dependent if (j,κ)=(j,κ). The number of dependent pairs in the counting process is equivalent to the number of tuples (νi,νj,νi,νκ) such that (j,κ)SiSi, which corresponds to the number of 4-cycles in the input graph G. Therefore, the squared 2-error is approximately proportional to the number of 4-cycles in the graph, which is O(n4).

Let us assume that the indices of all users are predetermined and publicly known before the counting process begins. Define Si={(j,κ):{νi,νj},{νi,νκ}E and j<i<κ}. If node i only considers the pairs (j,κ) within Si, then each triangle is counted exactly once. The estimated number of triangles, f^Δ(G), can be calculated as f^Δ(G)=it^i, where t^i=(j,κ)Siei,j,κ. In this counting method, the number of dependent variable pairs is at most the number of 4-cycles that contain the three nodes νi,νj,νκ with j<i<κ.

Let δ represent the degeneracy of the input graph G, and for each νV, let d(ν) denote the degree of ν. Assume that the degrees of all nodes are publicly known, and the nodes are indexed in non-decreasing order of their degree, i.e., if i>j, then d(νi)d(νj). Referring to the bound established by Chiba and Nishizeki [6], which states that (νi,νj)Emin(di,dj)O(δ|E|), we demonstrate in this paper that the number of such cycles is O(δ3n). Consequently, the squared 2-error is reduced from O(n4) in previous work to O(δ3n).

However, we cannot assume that the degrees of all nodes are publicly known, as this information is sensitive. To address this issue, we use local Laplacian queries, allowing each user to publish a noisy version of their degree. Let the noisy degree of νV be denoted as d~(ν). We then assign indices to users based on these noisy degrees, such that if i>j, then d~(νi)d~(νj). Afterward, we run the protocol described in the previous paragraph. We show that even with noisy degrees, the expected number of such cycles remains bounded by O(δ3n).

In summary, our mechanism involves two steps. First, users publish their noisy degrees using the local Laplacian mechanism, and the server assigns indexes based on these noisy values. In the second step, using the results of randomized response, each user νi estimates the number of triangles (νi,νj,νκ) where j<i<κ. This method significantly reduces the number of dependent triangle pairs in degeneracy-bounded graphs, which in turn lowers the variance of the estimation.

1.3 Related Works

The field of graph data mining under local differential privacy is relatively new. In contrast, differential privacy has been studied for many years by various researchers, including works like [18, 28]. According to [22], local differential privacy typically only hides edges or relationships, except in special cases like [36]. Differential privacy, on the other hand, can hide whether an individual or node is part of a social network, as shown in [19, 30]. Therefore, while both edge and node differential privacy exist, node differential privacy does not apply in the context of local differential privacy.

Recent works have proposed methods to estimate the densest subgraph, k-core decomposition, and degeneracy under local differential privacy [10, 9, 11]. However, since we are focused on estimating different graph statistics in graphs, we do not use or extend the ideas from these works. Instead, the estimation of degeneracy can be used to approximate the 2-error of our algorithm.

2 Preliminaries

2.1 Notations

For V={ν1,,νn} a set of vertices and EV2 a set of edges, we denote by G=(V,E) the graph on V. We consider simple undirected graphs, meaning that for ν,νV, (ν,ν)E and (ν,ν)E(ν,ν)E. We denote by n=|V| the size of the graph and m=|E| its number of edges.

For each i[1,n], we introduce ai=[ai,1,,ai,n], the adjacency list of user νi, where for any j[1,n], ai,j=1 if the edge (νi,νj) is in E and ai,j=0 otherwise. Additionally, we introduce di, the degree of node νi, which corresponds to the number of edges incident to νi.

We call a path of length k, denoted Pk, any tuple (νl1,,νlk) such that, for all i[1,k], (νli,νli+1)E, and, for all ij, νliνlj. We also use #Pk(G) to refer to the number of paths of length k in G. Similarly, a cycle of length k, or Ck, is a tuple (νl1,,νlk) that forms a path and satisfies (νlk,νl1)E. We will also use #Ck(G) to refer to the number of cycles of length k in G.

2.2 Edge Local Differential Privacy

We say that two adjacency lists a and a are neighboring if they differ by one bit, i.e. if we can go from one to the other by adding or removing an edge to node νi. If a is a neighbor of a, we write that aa. The notion of edge local differential privacy is as follows:

Definition 1 (ε-edge local differentially private query).

Let ε>0. A randomized algorithm is a ε-edge local differentially private query on the node νi if, for all neighboring bit strings aa, and for all S, it holds that

[(a)S]eε[(a)S].
Definition 2 (ε-edge local differentially private algorithm [29]).

Let 𝒜 be an algorithm that generates multiple randomized queries for each user, has each user apply these queries to their adjacency vector, and then estimates some graph statistics based on the results. We say 𝒜 is an ε-edge local differentially private algorithm if, for all users νi and for all possible sets of queries 1,,k inquired to νi (where for each 1jk, j is an εj-edge local differentially private query), it holds that ε1++εkε.

2.3 Laplacian Query and Restricted Sensitivity

Next, we introduce queries that are ε-edge local differentially private. We first consider a query which aims to give an estimate of a real number statistics of the adjacency vector.

Definition 3 (Edge local Laplacian query [21]).

For a function f:{0,1}n on adjacency lists, and aa denoting neighboring adjacency lists, the global sensitivity of f is defined as Δf=maxaa|f(a)f(a)|. For ε>0, the query that outputs f(a)+Lap(Δf/ε) is ε-edge local differentially private, where Lap(b) represents noise drawn from the Laplacian distribution with parameter b.

Global sensitivity in Definition 3 is designed to handle the worst-case scenario, which can lead to large amounts of noise being added to the data when using the Laplacian mechanism. However, if the data is known to belong to a specific set, restricted sensitivity allows us to adjust the noise according to the sensitivity within that set, resulting in more tailored and potentially lower noise levels.

Definition 4 (Restricted sensitivity (Definition 8 of [4])).

Let a=(a1,,an),a=(a1,,an){0,1}n and d(a,a) be the Hamming distance between a and a. The restricted sensitivity of f over a set of possible output is

RSf()=maxa,a(|f(a)f(a)|d(a,a)).

We can use restricted sensitivity to publish data even if it is not initially in the set. To do this, we first need to define a projection method to map the data to the set. In this work, we will consider d, the class of adjacency list with a maximum degree of d, for calculating restricted sensitivity. We assume that the order of all nodes is fixed, and if a node νi is adjacent to more than d nodes, we retain only the first d nodes according to this order. The map can be considered as an operation on each adjacency vector ai. We denote the mapping result on ai as μd(ai).

Definition 5 (Edge local Laplacian query with restricted sensitivity on d [4]).

For any f queried to a user i, the query that answers f(μ(ai))+Lap(3RSf(d)/ε) is called edge local Laplacian query with restricted sensitivity on d, and provides ε-edge local differential privacy.

2.4 Unbiased Randomized Response

In this subsection, we consider the randomized response query, which aims to publish an obfuscated adjacency vector.

Definition 6 (Randomized response query [33, 32]).

For ε>0, the randomized response mechanism takes an adjacency list a=(a1,,an) as input and outputs an obfuscated list a~=(a~1,,a~n). For i, the probability that a~i is set to 1 is given by:

[RR(a~i)=1]={eε1+eεif ai=111+eεif ai=0.

With this definition, randomized response provides ε-edge local differential privacy.

We can construct a graph G~ based on the collection of obfuscated adjacency vectors obtained from all users. Using the statistics of the obfuscated graph G~, we can then publish various information, including the number of subgraphs [34, 22, 23, 20]. However, randomized response produces biased results, making it less suitable for counting queries. This bias can be fixed by the subsequent definition.

Definition 7 (Unbiased randomized response query [16]).

Let ε>0 and a~i be the adjacency vector published through randomized response with budget ε by user νi. Then, for all (i,j)[1,n]2,

a^i,j=eε+1eε1a~i,j1eε1

is an unbiased estimator of ai,j. Additionally, for (i,j)(i,j), a^i,j is independent of a^i,j, and Var(a^i,j)=eε(eε1)2. We refer to a query that publishes a^i as the unbiased randomized response query. This query is ε-edge locally differentially private.

We can use the results from the unbiased randomized response query to calculate the number of subgraphs. For example, without privacy constraints, the number of triangles can be calculated as i<j<kai,jaj,kak,i. To privately estimate the number of triangles, we use i<j<ka^i,ja^j,ka^k,i. It is theoretically shown in [16] that the estimator i<j<ka^i,ja^j,ka^k,i has a smaller 2-error compared to the estimator obtained from the randomized response query, i<j<ka~i,ja~j,ka~k,i.

2.5 Graph Arboricity and Degeneracy

Graph arboricity and degeneracy can be defined as follows:

Definition 8 (Arboricity).

The arboricity of a graph G is the minimal number α(G) such that the edges of G can be partitioned into α(G) forests.

Definition 9 (Degeneracy).

The degeneracy of a graph G is the smallest number δ(G) such that any subgraph of G, contains at least one node with induced degree at most δ(G).

We observe that the variable δ is frequently used as a privacy parameter in differential privacy. However, since we do not consider that parameter in this paper, we choose to use δ to represent degeneracy, which is also a common convention. When the context is clear, we will drop the G of the notation and simply write α and δ. The two quantities are linked by the following theorem.

Theorem 10 (equation 3 and lemma 2.2 of [37]).

In any graph G, degeneracy and arboricity satisfy αδ2α1.

The arboricity has previously been used outside of the differential private community to bound some graph statistics. A folklore useful result is that the number of edges in a graph is smaller than δn. Another well known result is as follows:

Theorem 11 (Chiba-Nishizeki Bound [6]).

With m=|E| and di the degree of node νi, then

(νi,νj)Emin(di,dj)mα.

3 Node-Reordered Graphs and Their Properties

The first step of our mechanism is to order the vertices based on their estimated degree. The algorithm for this step is shown in Algorithm 1. At Line 2 of the algorithm, we privately publish the estimated degree. Under edge local differential privacy, the global sensitivity of the degree is 1. Therefore, we can use the Laplacian query (Definition 3) with noise scaled to 1/ε0 to publish the degree, where ε0 is the privacy budget allocated to this step. We denote the estimated degree as d~i=di+Lap(1/ε0).

Algorithm 1 Calculate a low degree ordering of a graph with respect to the estimated degree.

After publishing the estimated degrees, in Line 3, we assign an order ϕ to the nodes based on their degrees, which we refer to as a low degree ordering. For G=({ν1,,νn},E), we denote the reordered graph as Gϕ=(Vϕ,Eϕ), where Vϕ={ηii[1,n]} and νi=ηϕ(i) for all i. The edge set Eϕ is defined as {(ηϕ(i),ηϕ(j))(νi,νj)E}. We note that G and Gϕ are isomorphic, and thus have the same number of subgraphs. We denote by di(Gϕ) the degree of ηi in Gϕ and di(Gϕ) the number of neighbors of node ηi in the set {η1,,ηi1}.

Table 2: List of subgraphs analyzed in Section 3, including their representations and bounds on their counts in the graph produced by Algorithm 1. Oriented edges indicate directionality, with an arrow from νj to νi signifying that j>i. The bound on S2 aligns with the bound on S2 presented in [6], but it constitutes a distinct contribution as it is established for imperfectly ordered graphs. In contrast, the result on C2k is entirely novel to this work and serves as the primary result of our proof.
Symbol S2 Pk Ck C2k
Representation
Bound 𝒪(δ2n) 𝒪(δk2nk2+1) 𝒪(δk2nk2) [25] 𝒪(δk+1nk1)

In the remainder of this section, we analyze the properties of graphs produced by the reordering. Specifically, our focus is on bounding the frequency of certain substructures within the reordered graph. A summary of the results from this section is provided in Table 2.

Definition 12 (low star).

For k, a low-k-star is a subgraph consisting of a central node and k neighboring nodes, where at least one of the neighboring nodes has an index smaller than that of the central node. We denote by Sk(G) the number of such subgraphs contained in a graph G.

Theorem 13.

𝔼[S2(Gϕ)]𝒪(δ2n).

Proof.

Let 𝒩i(Gϕ) be the set of neighbors of ηi in Gϕ. We have that:

S2(Gϕ) =i=1ndi(Gϕ)(di(Gϕ)1)i=1ndi(Gϕ)×di(Gϕ)=i=1ndi(Gϕ)ηj𝒩i(Gϕ)𝟙j<i
=(ηi,ηj)Eϕdmax(i,j)(Gϕ)

Let τi denote the noise added to the estimated degree of user i. For each edge (ηi,ηj), their ranks can only be exchanged if the sum of the errors in both degree estimations exceeds the gap between the two degrees. Therefore, the quantity dmax(i,j)(Gϕ) satisfies

dmax(i,j)(Gϕ)min(di,dj)+|τi|+|τj|.

Using this inequality, we can rewrite the count of S2(Gϕ) as

S2(Gϕ)(ηi,ηj)Eϕmin(di,dj)+i=1n|τi|di.

Since τi is sampled from Lap(1/ε0), we have that |τi| follows an exponential law of expectation 1/ε0. Hence,

𝔼[S2(Gϕ)](νi,νj)Eϕmin(di,dj)+mε0.

Since G is isomorphic to Gϕ, α(G)=α(Gϕ) and using Theorem 11 it follows that

(νi,νj)Eϕmin(di,dj)mα(G).

Since mnδ and α(G)=O(δ), this gives 𝔼[S2(Gϕ)]𝒪(δ2n).

In addition to the ordered stars we just discussed, arboricity can also be used to bound the number of paths and cycles in a graph, as demonstrated in the following lemma and theorem. Recall that #Pk(G) is the number of paths with length k in the graph G.

Lemma 14.

For any positive integer k, #P2k(G)=𝒪(δknk+1),#P2k+1=𝒪(δk+1nk+1).

Proof.

We first consider #P2k+1(G). Let f be a function that maps a path of length 2k+1 to a tuple of k+1 edges, defined as f(e1,,e2k+1)=(e1,e3,,e2k+1). We observe that, for any tuple of k+1 edges denoted by =(e1,,ek+1), f1() is either a set containing one path or an empty set. There is at most one path that uses ei as the (2i1)-th edge of the path for all i. Thus, we can conclude that the number of paths of length 2k+1 is at most the number of sets of k+1 edges, which is mk+1=𝒪(δk+1nk+1).

Next, let us consider #P2k(G). Let f be a function that maps a path of length 2k to a tuple of k edges, defined as f(e1,,e2k)=(e1,e3,,e2k1). We observe that, for any tuple of k edges denoted by =(e1,,ek), f1() is a set of size no larger than n. There is at most one path of length 2k1 that uses ei as the (2i1)-th edge of the path, and there are at most n possible ways to extend a path of length 2k1 to a path of length k. Hence, #P2k(G)nmk=𝒪(δknk+1).

Recall that #Ck(G) is the number of cycles with size k in the graph G. We obtain the following theorem.

Theorem 15.

For any k1, #Ck+2(G)2kα(G)#Pk(G).

Proof.

Let us denote #Pk(i) the number of paths of length k that have node νi as an extremity and #Ck(i,j) the number of cycles of length k containing edge (νi,νj). Using these notations, we have #Ck+2=1k(νi,νj)E#Ck+2(i,j). Consider the number #Ck+2(i,j). For a path of length k that has a node νi as a terminal, there is at most one cycle of length k+2 which includes this path and the edge (νi,νj). Therefore, we conclude that #Ck+2(i,j)#Pk(i). Similarly, we have #Ck+2(i,j)#Pk(j). Hence,

#Ck+21k(νi,νj)Emin(#Pk(i),#Pk(j)).

For any function h:E{1,,n} such that for all e=(νi,νj)E, h(e) is equal to either i or j, min(#Pk(i),#Pk(j))#Pk(h(νi,νj)). By definition of the arboricity, there exists a set of disjoint forests {Fl}l=1,,α(G) such that E=l=1α(G)Fl. By choosing a root for each tree of these forests, we can introduce a function h such that each edge has its child node as an image. In this way, each node can only be the image of one edge per forest. This leads to

#Ck+2 1kl=1α(G)(νi,νj)Flmin(#Pk(i),#Pk(j))1kl=1α(G)eFl#Pk(h(e))1kl=1α(G)iV#Pk(i)
= 2kα(G)#Pk.

The last step is justified by the fact that each path having two extremities, the sum of all the paths of length k starting with node νi is twice the number of paths of length k.

Combining Lemma 14 and Theorem 15, we obtain the following corollary111We note that this result was independently established in [25] by a different proof. We are grateful to the anonymous reviewer for bringing this to our attention..

Corollary 16.

For k1, #C2k+2=𝒪(δk+1nk+1) and #C2k+1=𝒪(δk+1nk).

Next, we focus on the number of cycles of length 2k for any k2, in which three consecutive vertices of the cycle exhibit monotonic ranks C2k, as illustrated in Table 2. Throughout the rest of this article, we will denote the count of such subgraphs in G by #C2k(G), omitting G from the notation when the context is clear. In the following theorem, for simplicity, we extend the notation by assuming #P1(G)=1 and #P0(G)=n for every graph G.

Theorem 17.

For k2, #C2k(G)2α(G)S2(G)#P2k5(G).

Proof.

Let #C2k(i,j)(G) represent the number of subgraphs in G where three consecutive vertices exhibit monotonic ranks, with (νi,νj) being the edge immediately following these consecutive vertices. Also, for k2, let the number of paths of length p with a low-2-star as one of its extremities be denoted as #Pp. Since we can construct at most one path included in #Pp where a low-2-star and a path of length p3 are its extremities, we obtain the inequality #PpS2#Pp3.

Let C2k(i,j) be a cycle which is counted in #C2k(i,j). Consider the path in C2k(i,j) of length 2k2 starting from νi that does not pass through νj and the other path in C2k(i,j) of the same length starting from νj that does not pass through νi. We observe that one extremity of the two paths is a low-2-star. Hence, #C2k(i,j)min(#P2k2(i),#P2k2(j)) when #Pp(i) is the number of paths in the count of #Pp that have νi as an extremity. Using the same definition of h as in the proof of Theorem 15, we have

#C2k(G) (νi,νj)E#C2k(i,j)(νi,νj)Emin(#P2k2(i),#P2k2(j))
l=1α(G)(νi,νj)Flmin(#P2k2(i),#P2k2(j))l=1α(G)eFl#P2k2(h(e))
l=1a(G)iV#P2k2(i)2α(G)#P2k22α(G)S2#P2k5.

The next corollary follows Theorem 13, 17, and Lemma 14.

Corollary 18.

For k2, 𝔼[C2k(Gϕ)]=𝒪(δk+1nk1).

The next corollary considers the number of edge sets in Gϕ with specific properties.

Corollary 19.

For any p, we consider edge sets 𝖤Eϕ of size 2p such that 1) for some c>0, there exists a set of cycles C1,,Cc in Gϕ where C1Cc=𝖤 and CiCj= for ij, and 2) at least one of C1,,Cc contains three consecutive vertices of monotonic index. The number of such edge sets is 𝒪(δp+1np1).

Proof.

Consider a partition of 2p, denoted by (p1,,pc), where p1++pc=2p. The number of such partitions is a function of p and can be considered constant. We will demonstrate that the number of cycle sets C1,,Cc satisfying the conditions in the corollary statement, with |Ci|=pi, is at most 𝒪(δp+1np1). Therefore, the number of cycle sets satisfying the corollary statement is no more than 𝒪(δp+1np1).

To prove the bound, we will consider two cases: either all the cycles have even lengths, or at least two of them have odd lengths, given that the total number of edges is even.

If all the cycles are of even length, then, for some q>0 one of them is of length 2q and includes 3 consecutive vertices of monotonic index. By Corollary 18, there are 𝒪(δq+1nq1) possibilities for this cycle. For the remaining cycles, Corollary 16 tells us that the number of admissible configurations is bounded by 𝒪(δpqnpq). In total, this gives a 𝒪(δp+1np1) bound. If at least two cycles have odd lengths, say 2q+1 and 2r+1, then by Corollary 16, the number of possible configurations for these cycles can be bounded by 𝒪(δq+1nq) for the first cycle and 𝒪(δr+1nr) for the second cycle, and 𝒪(δpqr1npqr1) for the remaining cycles. Overall, this results in a bound of 𝒪(δp+1np1).

4 Triangle Counting Algorithm

We propose Algorithm 2 to count the number of triangles based on the ordering and properties discussed in the previous section. First, we execute Algorithm 1 at Line 2. Next, at Line 3, we use the randomized response query to obtain an obfuscated graph. From Lines 4 to 8, we employ the Laplacian query with restricted sensitivity on d (Definition 5) to estimate the number of triangles associated with User i. Finally, at Line 9, we sum all the estimates and report the total as the estimated triangle count. We adopt the concept from [22] of distributing randomized response results to all nodes and having each node estimate its number of triangles. However, the other algorithmic ideas presented in this work are novel. In the following theorem, we demonstrate that our algorithm is differentially private.

Algorithm 2 Our algorithm for estimating the number of triangles in degeneracy-bounded graphs.
Theorem 20.

Algorithm 2 provides (ε0+ε1+ε2)-edge local differential privacy.

Proof.

For all possible executions of Algorithm 2, it inquires three queries to all users. They are 1) the Laplacian query with privacy budget ε0 inside the GetOrderingfunction at Line 2, 2) the unbiased randomized response query with privacy budget ε1 at Line 3, and 3) the Laplacian query with restricted sensitivity on d at Lines 4-8.

To prove this theorem, we only need to show that the query at Lines 4-8 is ε2-edge local differentially private. The query aims to publish f(aiϕ)=(j,k)Sia^j,kϕ. By the unbiased randomized response in Line 3, we have that, for any j,k,j,k, |a^j,kϕa^j,kϕ|eε1+1eε11. It can be shown that, for aiϕ,aiϕd^iϕ (defined in Definition 5) such that d(aiϕ,aiϕ)𝖽, the number of different elements in the set Si obtained from aiϕ,aiϕ at Line 6 is at most 𝖽d^iϕ. Therefore, the restricted sensitivity of the function f (denoted by RSf(d^iϕ)in Definition 5) is not larger than 𝖽d^iϕeε1+1eε111𝖽=d^iϕeε1+1eε11. Hence, by Definition 5, the publication of t~i at Line 8 is ε2-edge local differentially private.

We now discuss the accuracy of our estimation and its relation with the parameter ζ appearing at Line 4 the algorithm. We will see that ζ controls the trade off between the bias and the accuracy. The smaller ζ is, the smaller the average noise gets, but the larger the probability of bias and its expected magnitude is.

In the following lemma, we discuss that the projection μd^iϕ applied at Line 5 changes the adjacency vector aiϕ only with small probability.

Lemma 21.

For any ζ>0, with probability at least 1ζ, |d~idi|<(lnnζ)/ε0 for all i.

Proof.

Using the cumulative distribution function of the Laplacian random variable, we have [|d~idi|ε0lnnζ]ζn. Thus, by taking this inequality for all i[1,n], and using the union bound, we obtain [i[1,n],|d~idi|ε0lnnζ]ζ.  

We show that our estimation has no bias with high probability in the subsequent theorem.

Theorem 22.

With probability at least 1ζ, algorithm 2 provides an unbiased estimate of the number of triangles in the graph, i.e. 𝔼[f^(G)]=#C3(G).

Proof.

As discussed in Definition 7, we have that 𝔼(a^j,kϕ)=aj,kϕ. Using Lemma 21, with probability at least 1ζ, d^iϕ is larger than diϕ for all i[1,n], and the function μd^iϕ has no effect. Consequently, Si precisely represents the set of forks centered on node νi, encompassing all possible triangles. Therefore, t^i is an unbiased estimate of the number of triangles (νi,νj,νk) such that j<i<k. Given that Laplace noise is centered and triangles can be decomposed accordingly, f^(G) is an unbiased estimation of f(G).

Corollary 23 ensures that even in the unlikely event of some clipping occurring, the resulting bias would still represent only a small fraction of the actual count.

Corollary 23.

The expected value of the bias of Algorithm 2 is bounded by 𝒪(ζε0n#C3).

Proof.

When the corrected estimated degree d^iϕ is smaller than the actual degree di, did^iϕ edges are excluded. This exclusion introduces a bias because the potential triangles involving these excluded edges are not counted. For each user i and their neighbor j, let ti(j) denote the number of triangles counted by user i that involve the edge (νi,νj). We also define timax=maxjti(j). Then, the maximum bias resulting from a single clipped edge can be bounded by timax.

The expected number of clipped edges for user i is determined by evaluating the following integral, where β=ln(n/ζ)ε0 serves as the correction term for the degree:

βε02eε0|x|(xβ)𝑑x=ε02[xβε0eε0x+1ε02eε0x]β=12ε0eε0β=ζ2ε0n.

We obtain the final result by combining these elements and observing that itimaxi,jti(j)2f(G).

The accuracy of our estimation is demonstrated in the subsequent theorem.

Theorem 24.

When ζε0, the squared expected 2-error of algorithm 2 is bounded by

𝒪(δ3nε12+δdmaxnε12ε22+nln2(n/ζ)ε02ε12ε22).

Proof.

The squared 2-error can be decomposed into the square of the bias plus the variance. We have established in Corollary 23 that the bias of the algorithm is bounded by 𝒪(ζε0n#C3)=𝒪(δ2). We will now focus on bounding the variance of the algorithm. This variance arises from two distinct sources: the randomized response query and the Laplacian query with restrictive sensitivity.

Regarding the noise introduced by the Laplacian query with restrictive sensitivity, its variance is simply the sum of the variances of each term, which is

9(eε1+1eε11)2νiVd^i2ε22=𝒪(ε02δdmaxn+nln2(n/ζ)ε22ε12ε02).

Next, we consider the variance from the randomized response query. In the following equations, we use the notation 𝒩j,k to denote the set of neighbors νi of both νj and νk such that j<i<k. Note that by including one node from 𝒩j,k along with νj and νk, a triple in S2 is formed. Similarly, including two nodes from 𝒩j,k along with νj and νk results in a quadruplet in #C4. We also notice from Definition 7 that, for (j,k)(j,k), a^j,kϕ is independent to a^j,kϕ and Cov(a^j,kϕ,a^j,kϕ)=0. Hence,

Var(νiVϕ(j,k)Sia^j,kϕ) =(νj,νk)(Vϕ)2[νi𝒩j,kVar(a^j,kϕ)+νi,νi𝒩j,kCov(a^j,kϕ,a^j,kϕ)]
=𝒪((S2+#C4)/ε12)

By Theorem 13 and Collorary 18, Var(f^(G))=𝒪(δ3nε12+δdmaxnε12ε22+nln2(n/ζ)ε02ε12ε22).  

In the previous work [16], the number of terms in the variance calculation is bounded by the number of cycles of length four, which is 𝒪(dmax3n). We reduce that number to #C4=𝒪(δ3n) using the GetOrderingfunction in Line 2 and by including only pairs (j,k) such that j<i<k. It is known that δdmax and, in many practical graphs, the degeneracy is much smaller than the maximum degree.

5 Odd Length Cycle Counting

In this section, we will describe how to utilize low-degree ordering to accurately count odd-length cycles in graphs with bounded degeneracy. Some concepts are extended from the previous section. As shown in Algorithm 3, the algorithm for estimating the number of odd-length cycles is similar to Algorithm 2, except that the restricted sensitivity at Line 9 is larger, and at Line 8, we replace a^i,jϕ with an estimate for the number of paths under specific constraints. We discuss the privacy of the algorithm in the subsequent theorem. The main challenge of the proof is to demonstrate that the Laplacian query under restricted sensitivity at Lines 5-9 is ε2-differentially private.

Algorithm 3 Our algorithm for estimating the number of odd-length cycles in degeneracy-bounded graphs.
Theorem 25.

Algorithm 3 provides (ε0+ε1+ε2)-edge local differential privacy.

Proof.

We need to demonstrate that Lines 5-9 of the algorithm, involving the Laplacian query with restricted sensitivity on d^iϕ, ensure ε2-differential privacy. Following the arguments of Theorem 20, we assert that altering 𝖽 entries of ai,jϕ changes the set Si by at most 𝖽d^iϕ elements. A single element change in Si can alter the value of c^i by

#P^k2(i)(j,κ)=(l1,,lk1)Xk2(i)(j,κ)q[1,k2]a^lq,lq+1ϕ(eε+1eε1)2#P^k4.

Therefore, the restricted sensitivity of c~i is 𝖽d^iϕ(eε+1eε1)2#P^k4/𝖽=d^iϕ(eε+1eε1)2#P^k4. Consequently, the publication of c~i at Line 9 is ε2-differentially private.

The bias of the algorithm is given in the following theorem.

Theorem 26.

With a probability of at least 1ζ, Algorithm 3 provides an unbiased estimate of the number of k-cycles in any graph G for any odd integer k.

Proof.

Since we publish a^lq,lq+1ϕ using the unbiased randomized response query, the publication is an unbiased estimation of alq,lq+1ϕ. Furthermore, as those estimators are independent from one another, for each (j,κ)Si and {l1,,lk1}Xk2(i)(j,κ), q[1,k2]a^lq,lq+1ϕ is an unbiased estimate of q[1,k2]alq,lq+1ϕ. It results from this that #P^k2(i)(j,κ) is an unbiased estimator of the number of paths between j and κ with length k2 such that, for any three consecutive nodes (νq,νr,νs) with monotonic ranks, the node νi has a lower rank that νr. We denote the number of such paths as #Pk2(i)(j,κ).

Let us introduce Ck(i)=(j,k)Si#Pk2(i)(j,κ). Assuming no clipping occurs, which happens with a probability of at least 1ζ, we have by linearity of expectation that both c^i and c~i are unbiased estimators of Ck(i). Therefore, all that remains to be proven is that the number of k-cycles in G is equal to νiVϕCk(i). It is evident that for each element counted in νiVϕCk(i), there is a corresponding cycle (νi,l1,,lk1) in Gϕ and, also, in G.

Conversely, consider a cycle of length k in G. Since it is also a cycle in Gϕ, we can represent it in Gϕ as (ν1,,νk). Because the cycle is of odd length, there exist three consecutive nodes with a monotonic rank. Among all possible triplets, consider the one where the central node has the smallest rank, denoted as (νj,νi,νκ) with j<i<κ. Furthermore, let j=l1 and κ=lk1, and assign the indices of the other nodes in the cycle to l2 through lk2 in the order they appear in the cycle. Thus, the cycle is counted in νiVϕCk(i). Furthermore, if any other node in the cycle were chosen as νi, the remaining path would not be part of Xk2(i)(j,κ). This ensures that each cycle is counted exactly once in νiVϕCk(i).

Finally, the 2-error of Algorithm 3 is proven in the next theorem. The most challenging aspect of this theorem is to bound the covariance in the summation at Lines 8 and 10. We assert that any two dependent elements of Xk2(i)(j,κ) can be considered as a set containing an even number of edges which forms multiple disjoint cycles with specific properties. Consequently, we can utilize our results from Corollary 19 to bound the number of such pairs. The proof of the theorem is given in the appendix of this paper.

Theorem 27.

When ζε0, the expected squared 2-error of algorithm 3 is bounded by

𝒪(δ3ε12(1ε12+δ)k3nk2+δk2dmaxnk2ε22ε14+δk3nk2ln2(n/ζ)ε22ε14ε02).

Before proving Theorem 27, we demonstrate the following lemma.

Lemma 28.

The expected value of the bias of Algorithm 3 is bounded by 𝒪(ζε0n#Ck).

Proof.

We have already seen the proof of Corollary 23 that the expected value of the number of clipped edges for user i was bounded by ζ2ε0n. We now have to bound the bias created by one edge removal, i.e. the maximal number of cycles one edge can part of.

With ci(j) the number of cycles counted by i that involve edge (i,j), the maximal bias for user i is bounded by jci(j), and the bias of the algorithm by ζ2ε0ni,jci(j)𝒪(ζε0n#Ck).

Now, we are ready to prove Theorem 27.

Proof of Theorem 27.

The squared 2-error can be decomposed into the square of the bias plus the variance. In Lemma 28, we established that the bias of the algorithm is bounded by 𝒪(ζε0n#Ck)=𝒪(δk+12nk32). We will now focus on bounding the variance of the algorithm.

Let the indicator variable 𝟙(l1,,lp) be 1 if the path (νl1,,νlp) exists in Gϕ, and 0 otherwise. We also denote the random variable q[1,p]a^lq,lq+1ϕ by Z(l1,,lp+1). Finally, we define U(l1,,lp+1)=Z(l1,,lp+1)𝟙(l1,,lp+1). This random variable U(l1,,lp+1) has the properties that 𝔼[U(l1,,lp+1)]=0 and Var(U(l1,,lp+1))=Var(Z(l1,,lp+1)).

Similar to the case with triangles, the variance of Algorithm 3 arises from both the unbiased randomized response query and the Laplacian query with restricted sensitivity.

Concerning the variance term coming from the randomized response, we have to compute the variance of

C^=νiV(ci^ci)=νiV(j,κ)Si{l1,,lk1}Xk2(i)(j,κ)U(l1,,lk1).

We have to take into account the term that comes from the sum of the variances of the U as well as the one coming from the covariances between them.

To compute the sum of variances, we start with:

Var(U(l1,,lk1))=q[1,k2]Var(a^lq,lq+1ϕ)=𝒪(1ε12k4).

Additionally, for each i and (j,κ)Si, the cardinality of Xk2(i)(j,κ) is bounded by nk3, and the number of ways to choose (i,j,κ) is bounded by S2, which is 𝒪(δ2n) by Theorem 13. This contributes a term in the variance from the sum of variances bounded by 𝒪(δ2nk2/ε12k4).

To analyze the term arising from the covariances, we first examine the covariance between U(l1,,lk1) and U(l1,,lk1). In the following equations, let A be the set of edges that appear only in (l1,,lk1) or (l1,,lk1), and let B be the set of edges that appear in both. Recall that, for any (i,j) 𝔼[a^i,jϕ]=0 and 𝔼[a^i,jϕ]=Var(a^i,jϕ).

Cov(U(l1,,lk1),U(l1,,lk1))
=𝔼[q[1,k2]a^lq,lq+1ϕq[1,k2]a^lq,lq+1ϕ]𝟙(l1,,lk1)𝟙(l1,,lk1)
=(i,j)A𝟙(i,j)(i,j)BVar(ai,jϕ)q[1,k2]𝟙(lq,lq+1)𝟙(lq,lq+1).

We observe that the covariance between U(l1,,lk1) and U(l1,,lk1) is zero if the paths (νl1,,νlk1) and (νl1,,νlk1) do not share at least one common edge or if the edges present in only one of the paths are not present in the original graph. Now consider the situation where the covariance is non-zero. We have that |B|>0. Additionally, we will denote νi the node responsible for counting this instance of U(l1,,lk1) and νi the one responsible for U(l1,,lk1).

Let 𝖵:={νi,νl1,,νlk1,νi,νl1,,νlk1}, and let

𝖤:=1qk2{(νlq,νlq+1),(νlq,νlq+1)}{(νlk1,νi),(νi,νl1),(νlk1,νi),(νi,νl1)}.

In other words, the set 𝖤 consists of the edges in the paths {l1,,lk1} and {l1,,lk1}, along with the additional edges (νlk1,νi), (νi,νl1), (νlk1,νi), and (νi,νl1). Additionally, let

A :=A{(νlk1,νi),(νlk1,νi)(νlk1,νi)(νlk1,νi)}
{(νi,νl1),(νi,νl1)(νi,νl1)(νi,νl1)}.

Similarly, let

B :=B{(νlk1,νi)(νlk1,νi)=(νlk1,νi)}{(νi,νl1)(νi,νl1)=(νi,νl1)}.

In other words, the sets A and B are the sets A and B extended to include the additional edges (νi,νl1), (νlk1,νi), (νi,νl1), and (νlk1,νi).

We introduce 𝐝 the difference between the cardinal of B and B, 𝐝:=|B||B|. Let q[1,k2] be the cardinality of B. In this case, the covariance is 𝒪(1/ε12q). We have that |A|+2|B|=2k, which gives |A|=2k2q2𝐝.

In the next step, we will calculate the number of the pairs of paths with |A|=2(kq𝐝). Let us consider the degree of each node in (𝖵,𝖤). It is clear that the degrees are neither greater than four nor less than two. A node has a degree of three only if one of the three edges incident to it belongs to B and the other two to A. A node has a degree of four if all four edges incident to it are in A, and it has a degree of two if both edges incident to it are either in A or in B. Hence, if we consider the graph (𝖵,A), we have a graph of degree two or four, which is a union of multiple disjoint cycles.

Let the number of those disjoint cycles be c and the size of those cycles be r1,rc. We have that t=1crt=2k2q2𝐝, i.e. (r1,,rc) is a partition of 2k2q2𝐝. We know that the number of such partitions is bounded by a function of k. Let suppose that the bound is f(k).

Let us give the number of A with cycle size (r1,,rc). We can use Corollary 16 to show that the number of such sets A is 𝒪(t=1cδrt/2nrt/2)=𝒪(δkq𝐝nkq𝐝). When 𝐝=0, we know that {νi,νj} and {νi,νk} are in A. There are three consecutive nodes with monotonic ranks in the union of disjoint cycles (𝖵,A). Hence, we can use Corollary 19 to show that the number of such sets A is bounded by 𝒪(δkq+1nkq1). By combining the two cases, we can conclude that the number of possible sets A with cycle size (r1,,rc) is at most 𝒪(δkq+1𝐝nkq1). The number of possible A is then f(k)𝒪(δkq+1𝐝nkq1). As k is a constant, the number is 𝒪(δkq+1𝐝nkq1).

We then consider the number of configurations for B, which consists of a union of disjoint paths. Let the number of paths be c and their lengths be r1,,rc. We have that |r1|++|rc|=q, and (r1,,rc) forms a partition of q. The number of possible partitions is bounded by a function of k, denoted as f(k). Each part must begin and end in the node set A, where |A|2k. Therefore, the number of possible paths rt is at most 4k2nrt1, and the number of possible sets B with the partition (r1,,rc) is at most t=1c4k2nrt1=𝒪(nq1). Hence, the total number of possible sets B is f(k)𝒪(nq1)=𝒪(nq1).

Consequently, for each set A, the number of possible configurations for B is at most 𝒪(nq1). The number of pairs of paths {l1,,lk1} and {l1,,lk1} with |A|=2(kq𝐝) is then at most 𝒪(δkq+1𝐝nkq1nq1)=𝒪(δkq+1nk2). Each of these pairs contributes Var(a^j,kϕ)2q=𝒪(1/ε12q) to the covariance sum.

The covariance of C^ can then be calculated as follows:

𝒪(q=1k2δkq+1nk21ε12q)=𝒪(nk2δ3ε12(δ+1ε12)k3).

Since this bound is larger than the one for the sum of variances, we can disregard the latter.

To compute the variance resulting from the Laplacian query with restricted sensitivity, we sum the variance of the Laplacian distribution for all nodes:

9(eε+1eε1)4𝔼[#P^k42]νiVd^i2ε22=𝒪(δdmaxnε22ε14+nln2(n/ζ)ε22ε14ε02)𝔼[#P^k42]. (1)

Let us now consider the expected value

𝔼[#P^k42]=𝔼[#P^k4]2+Var(#P^k4)=#Pk42+Var(#P^k4). (2)

By Lemma 14 and the fact that k4 is an odd number, we have #Pk42=𝒪(δk3nk3). The variance can be decomposed into the sum of the variances of each path, which is bounded by 𝒪(nk3/ε12k8), and the sum of covariances.

The covariance is non-zero only if at least two edges are shared between the two paths and all edges that appear only once exist in the original graph. As previously discussed, this forms a cycle structure, except for the path extremities that do not need to be connected. Recall the definitions of the sets A and B from the previous paragraph.

The set A consists of two paths at the extremities and multiple disjoint cycles. Suppose the number of edges in A is 2p, the number of edges in the two paths are q1 and q2, and the number of disjoint cycles is c, with the number of edges in these cycles being r1,,rc. This gives us 2p=q1+q2+i=1cri. In other words, (q1,q2,r1,,rc) forms a partition of 2p2k. The number of such partitions is bounded by a function of k. Let the bound be f(k).

We now discuss the number of possible configurations of A for the partition (q1,q2,r1,,rc). From Lemma 14 and Corollary 16, the number of cycles of length q is bounded by 𝒪(δq/2nq/2), and the number of paths of length q is bounded by 𝒪(δq/2nq/2+1). Thus, the number of configurations for the partition (q1,q2,r1,,rc) is:

𝒪(δq1/2nq1/2+1δq2/2nq2/2+1t=1cδrt/2nrt/2)=𝒪(δpnp+2).

Hence, the number of possible configurations for A with 2p edges is no more than f(k)𝒪(δpnp+2)=𝒪(δpnp+2).

The number of edges in B is (2k82p)/2=kp4. Using the previous argument when calculating the number of possible set B, we obtain that the number of configurations for B is 𝒪(nkp5). The number of configurations with |A|=2p is then 𝒪(δpnp+2nkp5)=𝒪(δpnk3). Hence, the overall number of combinations is p=1k5𝒪(δpnk3)=𝒪(δk5nk3).

From the previous paragraph, we observe that the covariance term outweighs the sum of the variances, leading to Var(#P^k4)=𝒪(δk5nk3). Additionally, when calculating 𝔼[#P^k42] in (2), it is evident that #Pk42 dominates Var(#P^k4), resulting in 𝔼[#P^k42]=𝒪(δk3nk3). Substituting 𝔼[#P^k42] with 𝒪(δk3nk3) in (1), we find that the variance from the Laplacian mechanism is bounded by

𝒪(δk2dmaxnk2ε22ε14+δk3nk2ln2(n/ζ)ε22ε14ε02).

We obtain the theorem result by summing the variance from the unbiased randomized response query and the variance from the Laplacian query with restricted sensitivity.

6 Conclusion

In this work, we introduced a private vertex ordering algorithm. The transformation on the graph induced by this ordering reduces the count of specific order-sensitive motifs while preserving the overall graph structure. Due to its reliance on the Laplacian mechanism, the algorithm performs well even in high-privacy settings, making it an excellent preprocessing step for subgraph counting queries.

Within this framework, we first propose a new triangle counting algorithm whose accuracy depends on the count of specific ordered subgraphs. By combining this algorithm with the ordering preprocessing step, we achieve an expected error of 𝒪(n) for graphs with bounded degeneracy, compared to the 𝒪(n2) error seen in the current state of the art.

Subsequently, we extended the algorithm to address the more general case of odd-length cycle counting. We propose the first purely local differentially private counting algorithm specifically designed for cycles longer than triangles. Under the assumption of bounded degeneracy, the algorithm achieves an error of 𝒪(n(k1)/2) for cycles of length k.

Due to the constraints of local differential privacy, it might be assumed that the range of tasks we can perform on graphs under this privacy notion is limited. However, in this work, we demonstrate that more precise information can be published under local differential privacy by restricting our inputs to certain types of graphs. We believe that parameterized algorithms under local differential privacy represent an intriguing research area that can contribute significantly to both algorithm design and information privacy.

One limitation of this method is that the relative error can become significantly large when the number of cycles is small (or even zero), even in cases where the graph’s degeneracy – and consequently the 2-error of our algorithm – is high. Identifying a class of graphs for which an algorithm with bounded relative error can be designed would be a direction for future research. Another question for future investigation is determining lower bounds for degeneracy-bounded graphs under the local differential privacy.

References

  • [1] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
  • [2] Suman K. Bera, Lior Gishboliner, Yevgeny Levanzov, C. Seshadhri, and Asaf Shapira. Counting subgraphs in degenerate graphs. ACM Journal of the ACM (JACM), 69(3):23:1–23:21, 2022. doi:10.1145/3520240.
  • [3] Louis Betzer, Vorapong Suppakitpaisarn, and Quentin Hillebrand. Publishing number of walks and katz centrality under local differential privacy. In The 40th Conference on Uncertainty in Artificial Intelligence, 2024. URL: https://openreview.net/forum?id=76UkTmdmkB.
  • [4] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In Innovations in Theoretical Computer Science, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013, pages 87–96. ACM, 2013. doi:10.1145/2422436.2422449.
  • [5] Marco Bressan. Faster algorithms for counting subgraphs in sparse graphs. Algorithmica, 83(8):2578–2605, 2021. doi:10.1007/s00453-021-00811-0.
  • [6] Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210–223, 1985. doi:10.1137/0214017.
  • [7] Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1655–1658. ACM, 2018. doi:10.1145/3183713.3197390.
  • [8] Damien Desfontaines and Balázs Pejó. Sok: Differential privacies. Proc. Priv. Enhancing Technol., 2020(2):288–313, 2020. doi:10.2478/popets-2020-0028.
  • [9] Laxman Dhulipala, George Z. Li, and Quanquan C. Liu. Near-optimal differentially private k-core decomposition. CoRR, abs/2312.07706, 2023. doi:10.48550/arXiv.2312.07706.
  • [10] Laxman Dhulipala, Quanquan C. Liu, Sofya Raskhodnikova, Jessica Shi, Julian Shun, and Shangdi Yu. Differential privacy from locally adjustable graph algorithms: k-core decomposition, low out-degree ordering, and densest subgraphs. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 754–765. IEEE, 2022. doi:10.1109/FOCS54457.2022.00077.
  • [11] Michael Dinitz, Satyen Kale, Silvio Lattanzi, and Sergei Vassilvitskii. Improved differentially private densest subgraph: Local and purely additive. CoRR, abs/2308.10316, 2023. doi:10.48550/arXiv.2308.10316.
  • [12] Pål Grønås Drange, Patrick Greaves, Irene Muzi, and Felix Reidl. Computing complexity measures of degenerate graphs. In 18th International Symposium on Parameterized and Exact Computation, IPEC 2023, September 6-8, 2023, Amsterdam, The Netherlands, volume 285 of LIPIcs, pages 14:1–14:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.IPEC.2023.14.
  • [13] Cynthia Dwork. Differential privacy. In Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II, volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer, 2006. doi:10.1007/11787006_1.
  • [14] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pages 265–284. Springer, 2006. doi:10.1007/11681878_14.
  • [15] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi:10.1561/0400000042.
  • [16] Talya Eden, Quanquan C. Liu, Sofya Raskhodnikova, and Adam D. Smith. Triangle counting with local edge differential privacy. In 50th International Colloquium on Automata, Languages, and Programming, ICALP 2023, July 10-14, 2023, Paderborn, Germany, volume 261 of LIPIcs, pages 52:1–52:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ICALP.2023.52.
  • [17] Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pages 211–222. ACM, 2003. doi:10.1145/773153.773174.
  • [18] Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1106–1125. SIAM, 2010. doi:10.1137/1.9781611973075.90.
  • [19] Michael Hay, Chao Li, Gerome Miklau, and David D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009, pages 169–178. IEEE Computer Society, 2009. doi:10.1109/ICDM.2009.11.
  • [20] Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya. Communication cost reduction for subgraph counting under local differential privacy via hash functions. CoRR, abs/2312.07055, 2023. doi:10.48550/arXiv.2312.07055.
  • [21] Quentin Hillebrand, Vorapong Suppakitpaisarn, and Tetsuo Shibuya. Unbiased locally private estimator for polynomials of laplacian variables. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, pages 741–751. ACM, 2023. doi:10.1145/3580305.3599537.
  • [22] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Locally differentially private analysis of graph statistics. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pages 983–1000. USENIX Association, 2021. URL: https://www.usenix.org/conference/usenixsecurity21/presentation/imola.
  • [23] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Communication-efficient triangle counting under local differential privacy. In 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pages 537–554. USENIX Association, 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/imola.
  • [24] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Differentially private triangle and 4-cycle counting in the shuffle model. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, pages 1505–1519. ACM, 2022. doi:10.1145/3548606.3560659.
  • [25] George Manoussakis. Listing all fixed-length simple cycles in sparse graphs in optimal time. In Fundamentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux, France, September 11-13, 2017, Proceedings, volume 10472 of Lecture Notes in Computer Science, pages 355–366. Springer, Springer, 2017. doi:10.1007/978-3-662-55751-8_28.
  • [26] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 94–103. IEEE Computer Society, 2007. doi:10.1109/FOCS.2007.41.
  • [27] Jaroslav Nesetril and Patrice Ossona de Mendez. Sparsity - Graphs, Structures, and Algorithms, volume 28 of Algorithms and combinatorics. Springer, 2012. doi:10.1007/978-3-642-27875-4.
  • [28] Iyiola E. Olatunji, Thorben Funke, and Megha Khosla. Releasing graph neural networks with differential privacy guarantees. Trans. Mach. Learn. Res., 2023, 2023. URL: https://openreview.net/forum?id=wk8oXR0kFA.
  • [29] Zhan Qin, Ting Yu, Yin Yang, Issa Khalil, Xiaokui Xiao, and Kui Ren. Generating synthetic decentralized social graphs with local differential privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 425–438. ACM, 2017. doi:10.1145/3133956.3134086.
  • [30] Sofya Raskhodnikova and Adam D. Smith. Differentially private analysis of graphs. In Encyclopedia of Algorithms, pages 543–547. Springer, 2016. doi:10.1007/978-1-4939-2864-4_549.
  • [31] Sina Sajadmanesh and Daniel Gatica-Perez. Locally private graph neural networks. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, pages 2130–2145. ACM, 2021. doi:10.1145/3460120.3484565.
  • [32] Yue Wang, Xintao Wu, and Donghui Hu. Using randomized response for differential privacy preserving data collection. In Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, Bordeaux, France, March 15, 2016, volume 1558 of CEUR Workshop Proceedings. CEUR-WS.org, 2016. URL: https://ceur-ws.org/Vol-1558/paper35.pdf.
  • [33] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
  • [34] Qingqing Ye, Haibo Hu, Man Ho Au, Xiaofeng Meng, and Xiaokui Xiao. Towards locally differentially private generic graph metric estimation. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, pages 1922–1925. IEEE, 2020. doi:10.1109/ICDE48307.2020.00204.
  • [35] Qingqing Ye, Haibo Hu, Man Ho Au, Xiaofeng Meng, and Xiaokui Xiao. LF-GDPR: A framework for estimating graph metrics with local differential privacy. IEEE Trans. Knowl. Data Eng., 34(10):4905–4920, 2022. doi:10.1109/TKDE.2020.3047124.
  • [36] Hailong Zhang, Sufian Latif, Raef Bassily, and Atanas Rountev. Differentially-private control-flow node coverage for software usage analysis. In 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, pages 1021–1038. USENIX Association, 2020. URL: https://www.usenix.org/conference/usenixsecurity20/presentation/zhang-hailong.
  • [37] Xiao Zhou and Takao Nishizeki. Graph coloring algorithms. IEICE Transactions on Information and Systems, 83(3):407–417, 2000.