Infinitely Divisible Noise for Differential Privacy:
Nearly Optimal Error in the High Regime
Abstract
Differential privacy (DP) can be achieved in a distributed manner, where multiple parties add independent noise such that their sum protects the overall dataset with DP. A common technique here is for each party to sample their noise from the decomposition of an infinitely divisible distribution. We analyze two mechanisms in this setting: 1) the generalized discrete Laplace (GDL) mechanism, whose distribution (which is closed under summation) follows from differences of i.i.d. negative binomial shares, and 2) the multi-scale discrete Laplace (MSDLap) mechanism, a novel mechanism following the sum of multiple i.i.d. discrete Laplace shares at different scales. For , our mechanisms can be parameterized to have and MSE, respectively, where denote the sensitivity; the latter bound matches known optimality results. Furthermore, the MSDLap mechanism has the optimal MSE including constants as . We also show a transformation from the discrete setting to the continuous setting, which allows us to transform both mechanisms to the continuous setting and thereby achieve the optimal MSE. To our knowledge, these are the first infinitely divisible additive noise mechanisms that achieve order-optimal MSE under pure DP for either the discrete or continuous setting, so our work shows formally there is no separation in utility when query-independent noise adding mechanisms are restricted to infinitely divisible noise. For the continuous setting, our result improves upon Pagh and Stausholm’s Arete distribution which gives an MSE of [39]. Furthermore, we give an exact sampler tuned to efficiently implement the MSDLap mechanism, and we apply our results to improve a state of the art multi-message shuffle DP protocol from [7] in the high regime.
Keywords and phrases:
Differential Privacy, Distributed Noise AdditionCopyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Mathematics of computing Distribution functions ; Security and privacyAcknowledgements:
Thanks to Peter Kairouz for useful discussions and pointing out the prior work, especially [6]. This work was supported by Google.Editors:
Mark BunSeries and Publisher:

1 Introduction
Differential Privacy (DP) [15] is a formal notion of privacy which bounds the sensitive information revealed by an algorithm. While there are many “flavors” of DP, most relevant to this work is so-called pure-DP or -DP which bounds the privacy loss via the parameter .
Definition 1 ([15]).
A randomized mechanism is -DP if, for all differing111For this work, we may consider any neighboring notion. We use the substitution notion for simplicity. in a single entry, for all measurable .
We focus on the so-called low-privacy regime where . Despite its name, this regime still provides meaningful privacy protection and is the setting most often employed in practical applications of DP (e.g. [1, 4, 44]). The bounds we state throughout will focus on this regime.
A challenge in deploying DP is doing so while also producing useful results. In this work, we focus on the mean squared error (MSE) of a query subject to a query-independent additive noise mechanism, i.e. where is a random variable. There is a rich body of work on optimizing MSE in this setting. Notably, the staircase mechanism ([18, 17, 19]) was shown to have the optimal MSE of all -DP, query-independent additive noise mechanisms. In the continuous setting, it achieves a MSE of . In the discrete setting, its MSE interpolates between and . For , the optimal discrete staircase mechanism is the discrete Laplace (aka Geometric) mechanism [25].
A probability distribution is infinitely divisible iff for every positive integer , there exists a distribution such that, when we sample i.i.d. random variables , their sum is distributed as . In distributed DP, a common technique (see [26] for an overview) is for parties to each sample such that the sum is distributed according to , which can be shown to protect the dataset with DP. The infinite divisibility property of allows for distributed protocols where an arbitrary number of parties can participate. Under the more restrictive setting where the additive noise mechanism must sample the noise from an infinitely divisible distribution, there was previously no known mechanism in either the discrete or continuous settings which matched the MSE of the staircase mechanism. We resolve this gap in this paper for both settings.
1.1 Related work
Distributed noise generation for differential privacy is well studied even for distributions that are not infinitely divisible. In fact, the idea dates back to the very early days of DP [14]. Moreover, several works have studied the setting where samples from some distribution and directly argue about the distribution of their summation . Examples include the case where is a Bernoulli distribution [13, 20], for which is a Binomial random variable, and the case where is a discrete Gaussian distribution [33], for which is “close” to discrete Gaussian random variable. The drawback here is that the distribution of are different for different values of , meaning that the privacy analysis often requires to be sufficiently large (e.g. [20]) or sufficiently small (e.g. [33]). Using infinitely divisible distribution overcomes this issue since the distribution of the total noise is always regardless of the value of , leading to a privacy analysis that works for all regimes of . Due to this, infinitely divisible noise distributions have gained popularity in distributed settings of differential privacy (e.g. [26, 7, 22, 23, 2, 12, 6, 21]).
As discrete distributions are typically easier to embed in multi-party cryptographic protocols and avoid implementation issues [36] with floating point representations, they tend to be more well-studied in distributed DP. The infinite divisibility of the discrete Laplace into negative binomial222Throughout this work, we use the term negative binomial to refer to the distribution generalized to a real valued stopping parameter . In other works, this is sometimes called the Pólya distribution. shares has been studied in [26, 7, 6]. In [6] the authors explicitly analyzed privacy in the face of dropouts, or parties that fail to properly add their noise share.
In the continuous setting, the infinitely divisible Arete distribution was introduced in [39] specifically to target the low-privacy, high pure-DP regime. It was designed to match the performance of the (continuous) staircase mechanism which is not infinitely divisible and therefore unusable in the distributed setting. While the continuous staircase mechanism achieves MSE, the authors only proved the Arete mechanism has an MSE of , though we believe this is not tight (Conjecture 30).
Distributed noise is relevant in other notions of differential privacy as well. The aforementioned discrete Gaussian mechanism [11, 33] satisfies zero-concentrated DP [10], and the Skellam mechanism [2, 43, 41] satisfies Rényi DP [37].333Mechanisms satisfying Rényi DP or zero-concentrated DP naturally also satisfy approximate DP. Both mechanisms were proposed in the context of federated learning via a secure aggregation multi-party protocol [34, 9, 8].
The GDL mechanism was explored informally in a prior blog post by the first author [27]. Concurrent to this work, the GDL distribution was studied in the context of shuffle DP in [5]. There the authors analyzed a shifted and truncated GDL distribution that achieves an approximate DP bound guarantee, vs. the pure DP one in this work which does not perform any truncation. Their primary result in the shuffle setting involving nearly matching the utility of the central discrete Laplace mechanism. On the other hand, as explained below, we improve on the central discrete Laplace mechanism for sufficiently large .
1.2 Our contributions
Discrete Distributions
Distribution
MSE
Inf. Div.
Reference
Discrete Laplace
✓
[25]
Discrete Staircase
✗
[18, 17, 19]
Generalized Discrete Laplace (GDL)
for
✓
Theorem 15
Multi-Scale Discrete Laplace (MSDLap)
✓
Corollary 19
Continuous Distributions
In Section 3, we introduce the GDL and MSDLap mechanisms, two discrete noise-adding mechanisms having optimal MSE for fixed and any sufficiently large . Inspired by the discrete staircase mechanism, we also extend the MSDLap mechanism with a parameter optionally allowing it to satisfy MSE, allowing it to achieve asymptotically-optimal error for any fixed and . Notably, the MSDLap mechanism matches the MSE of the discrete staircase including constants as .
The GDL mechanism, as the difference of two i.i.d. negative binomial noise shares, generalizes the distributed discrete Laplace mechanisms in [26, 7, 6] in the face of unexpected dropouts when a larger fraction of parties than expected fail to add their noise shares, providing a “smooth” closed-form privacy guarantee.
In Section 4, we introduce a method to transform a discrete infinitely divisible additive mechanism into a continuous one up to a small loss in parameters. This allows us to achieve the asymptotically-optimal MSE by transforming either the GDL or the MSDlap mechanisms, improving on the Arete mechanism’s bound of in [39].
Our noise distributions and previously known distributions are summarized in Table 1.
While its utility exceeds the GDL’s, the MSDLap mechanism naively requires sampling from independent negative binomial random variables. In Section 5 we outline an improved exact sampling algorithm which runs in only steps in expectation for relevant regimes. This algorithm may be of independent interest for general purpose multivariate negative binomial sampling in the sparse regime where most samples are 0.
2 Preliminaries
For any , we write as a shorthand for .
Function identities.
We introduce a few functions and identities used in our proofs. Let be the gamma function. We denote the rising factorial (aka the Pochhammer symbol) by . Denote the hypergeometric function by [38], where
(1) |
Let denote the digamma function, where . The following observation is well-known (see e.g. [3]):
Observation 2.
is increasing and is decreasing on .
Finally, we state the so-called “hockey-stick identity” (e.g. [32]):
Lemma 3.
For any non-negative integers ,
Distributions.
For convenience, we write a random variable (r.v.) and its distribution interchangeably.
For a discrete distribution with support on , denote its probability mass function (PMF) as for . When we say that a discrete distribution is infinitely divisible, we assume implicitly that are also discrete. Relevant to this work are the following well-known discrete distributions (where ):
-
The negative binomial (NB) distribution, denoted and with support on , has PMF It is infinitely divisible, as . Its variance is . For , the negative binomial distribution models the number of failures before the first successes in a sequence of i.i.d. Bernoulli trials with success probability .
-
The geometric distribution, denoted is a special case of NB with .
-
The discrete Laplace distribution, denoted and with support on , has PMF . Since , it inherits infinite divisibility from . Its variance is .
-
The Bernoulli distribution, denoted , has PMF
For a continuous distribution on , let for be its probability density function (PDF) at . Recall the following continuous distributions (where ):
-
The gamma distribution, denoted , with support on has PDF It is infinitely divisible as .
-
The Laplace distribution, denoted , with support on has PDF Its variance is . It is infinitely divisible as
We will also use the following simple observations (where are r.v.s):
Observation 4.
If is infinitely divisible, is infinitely divisible for any constant .
Observation 5.
If are infinitely divisible and independent, is infinitely divisible.
We say that a distribution is closed under summation if is infinitely divisible and additionally, follows the same distribution family as for all . This additional property provides benefits in the distributed setting as it ensures the mechanism’s privacy is well-understood even as parties drop out or join the protocol.
Max Divergence and DP.
Let denote the max divergence between two distributions , i.e., . We state the following well-known properties.
Lemma 6 (Post-Processing).
For any (possibly randomized) function and any random variables , we have .
Lemma 7 (Triangle Inequality).
For any distributions , .
For a query function , we let where the maximum is over all pairs and differing on one entry. The -noise addition mechanism for a query function is the mechanism that outputs where is drawn from . For a discrete (resp. continuous) distribution and (resp. ), we say the -noise addition mechanism is -DP for sensitivity iff the noise addition mechanism is -DP for all queries (resp. ) such that . It follows from the definition of DP and max divergence that this condition translates to the following (see e.g. [18]):
Lemma 8.
For a discrete (resp. continuous) distribution , the -noise addition mechanism is -DP for sensitivity iff for all (resp. if444The other direction of the implication for the continuous case does not hold since e.g. the set of where might have measure zero. for all ).
3 Discrete mechanisms
We present two infinitely divisible discrete noises with order-optimal MSEs.
3.1 The generalized discrete Laplace mechanism
This section introduces the generalized discrete Laplace distribution and associated mechanism. We start by the description of the distribution and its PMF:
Definition 9 ([35]).
For , let denote the distribution of where . For all , the PMF of at , i.e. , is
(2) |
The discrete Laplace distribution is a special case of the GDL with .
The infinite divisibility of NB immediately implies that GDL is also infinitely divisible:
Observation 10.
GDL is infinitely divisible and closed under summation. In particular, for independent we have .
It will be convenient to consider the extension of the PMF to all real numbers; let be the function defined by (2). We start with the following lemma.
Lemma 11.
For , is decreasing and log convex on .
Proof.
Because the product of log convex and decreasing functions is log convex and decreasing, and is both log convex and decreasing, we focus on remaining relevant term.
where . First we show that and that . The derivatives follow from the definition of , and the inequalities follow from Observation 2. Finally, the result follows as the sum of decreasing and log convex functions is decreasing and log convex.
We also need the following technical lemma which, together with Lemma 11, allows us to consider only . Its proof is deferred to Section A.1.
Lemma 12.
Let be symmetric about 0, and decreasing and log convex on . Then for any such that , we have
We are now ready to state the privacy guarantee of the GDL-noise addition mechanism.
Theorem 13.
For any , the -noise addition mechanism is -DP for sensitivity iff
Proof.
For the case, by Lemmas 8 and 12 along with the fact that is log-convex on and symmetric about 0, the -noise addition mechanism is -DP for sensitivity iff which is exactly the claimed bound.
For , we decompose the mechanism by observing (see Observation 10) that can be sampled as where are independent. By post-processing (Lemma 6), the -noise addition mechanism is at least as private as the -noise addition mechanism, which is -DP for sensitivity .
For the tightness claim, first note that Equation 1 yields
Observe that because every term in the product is at least one. Plugging this into the above, we get
Thus, meaning that our bound is tight.
The exact privacy bound for GDL above (for ) is fairly unwieldy. We show a simplified bound below, as well as a tighter, slightly more complex bound in Section A.2.
Corollary 14.
For any , the -noise addition mechanism is -DP for sensitivity where .
Proof.
Since , we can bound the ratio of the hypergeometric functions by
Thus, from Theorem 13, the mechanism is -DP for all . Observe also that Combining these yields the desired claim.
Finally, we prove our accuracy guarantee for the GDL mechanism in the high regime.
Theorem 15.
For any and , the -noise addition mechanism is -DP for sensitivity and has MSE .
Proof.
Since , we have . Thus, Corollary 14 implies the privacy guarantee. Finally, the MSE is
3.2 The multi-scale discrete Laplace mechanism
The -multi-scale discrete Laplace (-MSDLap) distribution with parameter is defined as the distribution of where . From Observation 4 and Observation 5, the -MSDLap distribution is infinitely divisible.
This mechanism is -DP, and its accuracy guarantee matches that of Theorem 15:
Theorem 16.
For any , the -MSDLap-noise addition mechanism is -DP for sensitivity . Furthermore, for , the MSE is .
Proof.
(Privacy) From Lemma 8 and the symmetry of the noise around 0, it suffices to show
for all . From Lemma 6, we have
where the last inequality follows from .
(Accuracy) MSE is
Theorem 16 implies that, for fixed and sufficiently large , the -MSDLap-noise achieves asymptotically optimal MSE. In fact, below we prove a stronger statement: When , the ratio between MSE of MSDLap and the optimal MSE [17, 19] approaches 1.
Proof.
As we show in Observation 35 (in Section A.3), the MSE of the -DP discrete staircase mechanism for sensitivity is at least for any sufficiently large . Thus, in this regime, the ratio between the two MSEs is at most The RHS approaches 1 as as claimed.
While the naive approach to sample from the MSDLap distribution requires sampling random variables, we show in Section 5 an efficient algorithm for the high regime.
3.2.1 Generalizing the MSDLap mechanism
We can also generalize the MSDLap mechanism to match the error in [18] for every setting of parameters . We state this below where is a parameter so that it matches the “” parameter in the discrete staircase mechanism in [18]. In our results henceforth, we assume that for simplicity; this 2 can be changed to any constant555By changing the distribution of in the proof of Theorem 18 to for some larger . but we keep it for simplicity of the distribution description.
Theorem 18.
For any and , there is an infinitely divisible discrete noise-addition mechanism that is -DP for sensitivity and has MSE .
Plugging in gives the following, which will be useful later in Section 6.
Corollary 19.
For any , there exists an infinitely divisible discrete noise-addition mechanism that is -DP for sensitivity with MSE .
We are now ready to prove Theorem 18. The rough idea is that, instead of using only the -MSDLap noise, we first add a scaled-up -MSDLap noise where . The scaling up leaves us with “holes” in the noise distribution. To fix this, we additionally add another DLap noise to “smoothen out the holes”. This idea is formalized below.
Proof of Theorem 18.
The case follows from Theorem 16.
For , let . Let the noise distribution be the distribution of where are independent. The infinite divisibility of follows from the infinite divisibility of MSDLap, DLap and Observations 4 and 5.
(Privacy) From Lemma 8, it suffices to show for all . Let and . Note that and . From Lemmas 7 and 6,
where the last inequality follows from and (together with Theorem 16 and Lemma 8).
(Accuracy) MSE is
Remark 20.
We can generalize the poof of Theorem 18 further by considering the -generalized-multi-scale mechanism that adds noise from two distributions where
-
-noise addition mechanism is -DP for , and is added at multiple scales
-
-noise addition mechanism is -DP for , and is used to “smoothen out the holes”
Specifically, the noise is where are independent. The privacy proof proceeds identically to Theorem 18 yielding an -DP mechanism. This allows us to consider , which gives us the multi-scale version of the GDL mechanism; this noise distribution is additionally closed under summation.
We plot the MSE of our new mechanisms and the established baselines in Figure 1.
4 From Discrete to Continuous
We next show simple methods to transform discrete noises to continuous ones. The approach is similar to Theorem 18, except that we use Laplace noise to “smoothen out the holes”.
Theorem 21.
For and , there exists a continuous infinitely divisible noise-addition mechanism that is -DP for sensitivity with MSE .
Proof.
By scaling, we may assume w.l.o.g. that .
Let and . Let be any infinitely divisible discrete distribution such that the -noise addition mechanism is -DP for sensitivity with MSE . (Such a distribution exists due to Theorem 16.666GDL also meets the requirements, but only for a specific regime of . Specifically, Theorem 15 requires . For , the continuous transformation of GDL is valid when .)
Let be the distribution of where are independent. Since are infinitely divisible, Observations 4 and 5 imply that is infinitely divisible.
(Privacy) From Lemma 8, it suffices to show for all . Let be the closest integer to and let . Note that and777 Note that the choice of that halves the maximum value of cannot be directly applied to Theorem 18. In that theorem, we take , while the approach here only works when . . From Lemmas 7 and 6, we have
where the last inequality follows from and -noise addition mechanism is -DP for sensitivity (and Lemma 8).
(Accuracy) MSE is where the last inequality is from our choice .
5 Efficiently and exactly sampling from the distributed MSDLap
This section outlines an algorithm to efficiently sample from the MSDLap mechanism described in Theorem 16 in the distributed setting over parties. Recall that the -MSDLap noise is defined as where . In other words, each of the parties need to sample where . Thus, the naive algorithm requires each party to sample from negative binomial random variables. For large , or for as in Corollary 19, this may be computationally expensive.
Our algorithm resolves this issue, allowing us to sample from exponentially many (in ) negative binomial random variables in time polynomial in .
Theorem 22.
For input , and , Algorithm 3 returns non-zero samples from i.i.d. samples of and completes in steps in expectation, where and hides polynomial factors in .
We leverage the fact that in the high regime, most of these NB random variables will be 0. We re-frame the problem of sampling many negative binomials into two separate problems:
-
Sampling from the sum of many i.i.d. NBs.
-
Fairly allocating the result across each r.v.
While sampling from the sum of many NBs is simple on its face given their infinite divisibility, standard samplers for (e.g. [28]) take time linear in which is not desirable. Below we will describe an algorithm (Algorithm 1) whose (expected) running time only scales with the mean of , which is only in our setting.
To fairly allocate across the random variables, we leverage the fact that the conditional distribution of the sequence of random variables given their sum follows the Dirichlet multinomial distribution denoted where and . Its PMF is
Lemma 23 (e.g. [45, 42]).
Let be such that are independent. Let . Then the conditional distribution .
Our sampler can be implemented on a finite computer in the Word RAM model avoiding any real-arithmetic operations. Following [11, Section 5], we focus on the runtime in the expected number of arithmetic operations, which take only polynomial time in the bit complexity of the parameters. We make the following assumptions on the availability of sampling primitives requiring only arithmetic operations in expectation:
-
A uniform sampler to draw for .
-
A sampler for , which trivially follows from the uniform sampler.
-
A sampler for from [11].
Finally, we assume a map data structure with accesses and updates in expectation, and a vector data structure with random access and append operations.
Input: , for
Output: A sample from
Input: , ,
Output: A sample from , encoded as a sparse map from variate index to count, with zero variates removed.
Input: , for
Output: Non-zero samples where
Our Algorithm 3 is straightforward. It consists of
-
1.
Sampling from with Algorithm 1 to learn the sum of all the terms, handling rational values of following the approach in [28] using a simple rejection sampler.
-
2.
Sampling from the Dirichlet multinomial distribution with Algorithm 2, which uses a version of the Pólya urn process [31] modified to handle rational fractions of balls. We sparsely encode the output to avoid storing zero entries, as MSDLap sampling only requires summing non-zero random variables.
Proposition 24.
For input and , Algorithm 1 returns one sample from and completes in arithmetic operations in expectation, where and hides polynomial factors in .
Proof.
We begin by analyzing the case , i.e., IntSample subroutine. Let be the r.v. describing the output, and be the th r.v. Denote . Note that . For ,
Thus, the output follows the distribution as desired.
The expected number of iterations of the loop is exactly . Each iteration of the loop takes expected time, as the geometric sampler requires arithmetic operations polynomial in the bit complexity of (which is ).
Next we analyze the outer loop which handles using the accept-reject approach in [28], ensuring that in each iteration, for a proposed sample and acceptance event :
This implies that the output follows distribution as desired, and that . The latter in turn implies that the number of trials follows a geometric distribution with success probability . Therefore the expected number of trials is , and the result follows as .
Proposition 25.
For input , and , Algorithm 2 returns one sample from and requires arithmetic operations in expectation.
Proof.
We first map Algorithm 2 to the Pólya urn model:
-
To initialize the urn contents, balls are added to the urn for each of the colors.
-
The algorithm proceeds over steps. At each step, a ball is chosen uniformly at random from the urn. When a ball is selected, it is replaced, along with other balls of that color.
-
For each color, the algorithm returns the count of how many times that color was chosen.
Algorithm 2 implements this, as maps to the th color , and is the list of selected colors. Note that the return value is sparsely encoded to avoid storing zero entries, or colors that have never been picked. For purposes of the proof of correctness we assume an unrolled output of equal to , where each counts the number of times the color was picked. This can be obtained with a simple post-processing step.
Let be the color of the ball chosen at iteration . The probability that given the previous draws is: , where denote the number of previous draws of color .
Note that only depends about the current state of the urn, not the order in which balls are picked. Similarly, note that the denominator of the fraction is independent of any information about . From this we can show that the probability of seeing any one particular sequence which has color counts is
Since there are sequences with color count , we have , proving the correctness.
For the run time analysis, both loops in the algorithm iterate times, and each require only constant arithmetic operations in expectation, assuming map and vector operations.
Lemma 23, Proposition 24, and Proposition 25 immediately show Theorem 22.
6 Order optimal MSE in the multi-message shuffle model
We next apply the noises from Section 3 to protocols in the shuffle model of DP.
First, we recall the definition of the shuffle model [13, 16]. An -message protocol in the shuffle model consists of a randomizer where denote the set of all possible messages and an analyzer where denotes the output domain. In the shuffle model, the analyst does not see the output of each randomizer directly but only the randomly shuffled messages. We write to denote the output of randomly shuffling (random) messages produced by for . The shuffle model required that these shuffled messages have to satisfy DP, as stated more formally below.
Definition 26 ([13, 16]).
An -message protocol is -shuffle-DP if, for every differing in a single entry, for all .
In the real summation problem, each is a real number in and the goal is to compute . Our main result of this section can be stated as follows:
Theorem 27.
For , there is an -shuffle-DP, -message protocol for real summation with MSE where each message is bit.
Prior to this work, the best known protocol has MSE of (even for large ) [7, 24, 23]888Pagh and Stausholm [39, Corollary 23] claim that their result implies a protocol with absolute error , but, to our knowledge, this is not the case. See the discussion at the end of this section. and thus our result provides a significant improvement in the large regime.
To prove this result, it is convenient to define the -summation problem: The input belongs to and the goal is to compute . Similar to before, an -message protocol consists of a randomizer and an analyzer . The protocol is exact if the analyzer always output the correct answer. The protocol is -secure if, for all such that , we have . Building on the “split and mix” protocol of [30], Balle at al. [7] and Ghazi et al. [24] gave the following protocol:
Theorem 28 ([7, 24]).
For and , there is an -secure -message protocol for -summation in the shuffle model where each message is from .
Balle et al. [7] presented an elegant method to translate a secure -summation protocol to a shuffle-DP real summation protocol. Below, we provide a slight generalization and improvement999Originally, their analysis has an additional error term due to “overflow / underflow”. We observe that this is in fact unnecessary, which reduces the claimed error and also simplifies the analysis. of their result, which eventually allows us to achieve Theorem 27.
Lemma 29 (generalization of [7] Lemma 5.2).
Suppose that, for some , there is a zero-mean discrete infinitely divisible distribution such that the -noise addition mechanism is -DP for sensitivity . Furthermore, suppose that, for some with and , there exists an -party -secure -message exact -summation protocol in the shuffle model. Then, there is an -message -shuffle-DP protocol for real summation with MSE .
Moreover, the message length is the same as in the -summation protocol.
Proof.
Let be the -secure exact -summation protocol. Our protocol where are defined as follows101010Note that the final protocol is done by composing the randomizer / analyzer with those of .:
-
1.
on input works by first computing a randomized encoding . It then outputs where .
-
2.
decodes the result by returning
(Privacy) The proof of privacy proceeds identically to [7, Lemma 5.2] and is omitted here.
(Accuracy) Since is an exact -summation protocol, its output is exactly equal to where where is distributed as .
Let be the true (unnoised) sum. The first step in our accuracy analysis is a claim111111This claim is indeed our improvement over the error analysis of [7, Lemma 5.2]. that . To see that this is true, let us consider the following cases:
-
Case I: . In this case, .
-
Case II: . We have and thus the inequality holds as an equality.
-
Case III: . We set . Thus,
-
Case IV: . We set . Thus, .
Thus, in all cases, we have . Therefore, the MSE is at most
We can now prove Theorem 27 by plugging our noise to the above lemma.
Proof of Theorem 27.
Let and . From Corollary 19, there is a zero-mean discrete infinitely divisible distribution such that -noise addition mechanism is -DP for sensitivity where . Furthermore, Theorem 28 ensures that there exists an -party -secure -message exact -summation protocol where . Thus, Lemma 29 implies that there is an -shuffle-DP -message protocol for real summation where each message is of length bits and MSE
In the above proof, we use the noise from Corollary 19, which is a modified version of MSDLap. We note that the continuous noises in Section 4 or [39] cannot be used here, as the protocol must round each contribution to a finite group prior to summing. Neither the privacy nor the infinite divisibility of the resulting sum distribution is clear in this case.
7 Conclusion
This work closes the utility gap for infinitely divisible mechanisms in the high pure DP regime. We find no separation in either the discrete or continuous settings by restricting the private mechanism to infinitely divisible noise addition. The “staircase-like” MSE of both GDL and MSDLap in the low-privacy regime make them a natural replacement for the staircase mechanism in the discrete distributed setting, and we hope the results introduced here can be of practical value. We show one such practical application by extending the [30] “split and mix” protocol under shuffle DP, resolving the open question posed in [23].
Beyond improving utility, we believe GDL is of independent interest for distributed private mechanism design due to the fact that it is closed under summation. This makes it well-suited for cases where “smooth” privacy guarantees are needed for multiple outcomes, or for a single deployed system (like a secure aggregation MPC protocol over clients e.g. [9]) where different people can make different assumptions about what the honest fraction of the involved clients are. Additionally, this property is useful in post-hoc privacy loss analysis when honesty assumptions are broken in a production system, and where otherwise an analyst must resort to numerical approximation of realized privacy loss.
In the continuous setting, our continuous transformations of all of the mechanisms in Section 3 outperform the existing Arete mechanism (Figure 2). We also note a strong resemblance between the Arete and the GDL distributions, as the NB distribution converges to the gamma distribution under certain conditions. See Section A.4 for more detail.
Finally, we hope our optimized MSDLap sampler Algorithm 3 or its constituent parts can be useful in any context where sparse, exact multivariate random generation is needed, or even where a general sampler needs to be sublinear in (Algorithm 1). For multivariate sampling when is very close to 1, our approach should be a large improvement over standard methods. We note that Algorithm 2 can be extended in a straightforward way (due to Lemma 23) to support variates with different values. The only change to the algorithm is initializing the urn with varying numbers of balls.
Open questions. In preparing this paper, we studied the Arete mechanism [39] extensively and are convinced it can also achieve the order optimal MSE of , but the formal proof eluded us. We present the following conjecture121212Even if this conjecture were proven, the constant factors for the Arete’s MSE still underperform the continuous multi-scale discrete Laplace. as an open question:
Conjecture 30.
Let , . The -noise addition mechanism with , , and where is -DP for sensitivity and has MSE .
Our work also raises the following questions:
-
How close to the optimal MSE including constants can be achieved in the for continuous and infinite divisible mechanisms? While we match the constants of the discrete staircase for large , there is still a sizable gap in the continuous case.
-
Can the number of messages required for our shuffle-DP protocol in Theorem 27 be improved? Recall that we use the approach of [7] to translate a secure summation protocol to a shuffle-DP one. In fact, there is a more direct approach by [23] that achieves a smaller number of messages. Since their proof also uses infinitely divisible noises, it is plausible that our noise distribution can be used there. However, their proof is considerably more involved compared to [7] and does not use the noise distributions in a black-box manner.
References
- [1] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
- [2] Naman Agarwal, Peter Kairouz, and Ziyu Liu. The skellam mechanism for differentially private federated learning. NeurIPS, pages 5052–5064, 2021.
- [3] Horst Alzer and Graham Jameson. A harmonic mean inequality for the digamma function and related results. Rendiconti del Seminario Matematico della Università di Padova, 137:203–209, 2017.
- [4] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
- [5] Andreas Athanasiou, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. Enhancing metric privacy with a shuffler. In PETS 2025 - 25th Privacy Enhancing Technologies Symposium, 2025.
- [6] Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, and Marco Gruteser. Towards sparse federated analytics: Location heatmaps under distributed differential privacy with secure aggregation. Proceedings on Privacy Enhancing Technologies, 4:162–182, 2022. doi:10.56553/POPETS-2022-0104.
- [7] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. In CCS, pages 657–676, 2020. doi:10.1145/3372297.3417242.
- [8] James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. Secure single-server aggregation with (poly)logarithmic overhead. In CCS, pages 1253–1269, 2020. doi:10.1145/3372297.3417885.
- [9] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In CCS, pages 1175–1191, New York, NY, USA, 2017. doi:10.1145/3133956.3133982.
- [10] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, pages 635–658, 2016. doi:10.1007/978-3-662-53641-4_24.
- [11] Clément L Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. NeurIPS, pages 15676–15688, 2020.
- [12] Clément L. Canonne and Hongyi Lyu. Uniformity testing in the shuffle model: Simpler, better, faster. In SOSA, pages 182–202, 2022. doi:10.1137/1.9781611977066.13.
- [13] Albert Cheu, Adam D. Smith, Jonathan R. Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT, pages 375–403, 2019. doi:10.1007/978-3-030-17653-2_13.
- [14] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006. doi:10.1007/11761679_29.
- [15] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006. doi:10.1007/11681878_14.
- [16] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019. doi:10.1137/1.9781611975482.151.
- [17] Quan Geng, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The staircase mechanism in differential privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, 2015. doi:10.1109/JSTSP.2015.2425831.
- [18] Quan Geng and Pramod Viswanath. The optimal mechanism in differential privacy. In ISIT, pages 2371–2375, 2014. doi:10.1109/ISIT.2014.6875258.
- [19] Quan Geng and Pramod Viswanath. The optimal noise-adding mechanism in differential privacy. IEEE Transactions on Information Theory, 62(2):925–951, 2016. doi:10.1109/TIT.2015.2504967.
- [20] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In EUROCRYPT, pages 463–488, 2021. doi:10.1007/978-3-030-77883-5_16.
- [21] Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. Pure-dp aggregation in the shuffle model: Error-optimal and communication-efficient. In ITC, pages 4:1–4:13, 2024. doi:10.4230/LIPICS.ITC.2024.4.
- [22] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, pages 3505–3514, 2020. URL: http://proceedings.mlr.press/v119/ghazi20a.html.
- [23] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Amer Sinha. Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. In ICML, pages 3692–3701, 2021. URL: https://proceedings.mlr.press/v139/ghazi21a.html.
- [24] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In Anne Canteaut and Yuval Ishai, editors, EUROCRYPT, pages 798–827, 2020. doi:10.1007/978-3-030-45724-2_27.
- [25] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility-maximizing privacy mechanisms. SIAM J. Comput., 41(6):1673–1693, 2012. doi:10.1137/09076828X.
- [26] Slawomir Goryczka and Li Xiong. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE transactions on dependable and secure computing, 14(5):463–477, 2015.
- [27] Charlie Harrison. Distributed noise addition and infinite divisibility. https://charlieharrison.xyz/2024/08/05/distributed-noise.html, 2024. Accessed: 2025-02-11.
- [28] Creighton Heaukulani and Daniel M Roy. Black-box constructions for exchangeable sequences of random multisets. arXiv preprint arXiv:1908.06349, 2019.
- [29] Wolfram Research, Inc. Mathematica, Version 12.0, 2019. Champaign, IL, 2019.
- [30] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248, USA, 2006. doi:10.1109/FOCS.2006.25.
- [31] Norman L. Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Discrete Multivariate Distributions, chapter 40, pages 200–203. Wiley, 1997.
- [32] Charles H Jones. Generalized hockey stick identities and n-dimensional blockwalking. The Fibonacci Quarterly, 34(3):280–288, 1996.
- [33] Peter Kairouz, Ziyu Liu, and Thomas Steinke. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In ICML, pages 5201–5212, 2021. URL: http://proceedings.mlr.press/v139/kairouz21a.html.
- [34] Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. In ICML, pages 5213–5225, 2021. URL: http://proceedings.mlr.press/v139/kairouz21b.html.
- [35] Seetha Lekshmi V and Simi Sebastian. A skewed generalized discrete laplace distribution. IJMSI, 2:95–102, 2014.
- [36] Ilya Mironov. On significance of the least significant bits for differential privacy. In CCS, pages 650–661, 2012. doi:10.1145/2382196.2382264.
- [37] Ilya Mironov. Rényi Differential Privacy. In CSF, pages 263–275, 2017. doi:10.1109/CSF.2017.11.
- [38] F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. V. Saunders B. R. Mille and, H. S. Cohl, and eds. M. A. McClain. NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/15.2, Release 1.2.2 of 2024-09-15, 2024. URL: https://dlmf.nist.gov/15.2.
- [39] Rasmus Pagh and Nina Mesing Stausholm. Infinitely divisible noise in the low privacy regime. In ALT, pages 881–909, 2022. URL: https://proceedings.mlr.press/v167/pagh22a.html.
- [40] Feng Qi. Bounds for the ratio of two gamma functions. Journal of Inequalities and Applications, 2010:1–84, 2010.
- [41] Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, and Hanna Wallach. Locally private bayesian inference for count models. In ICML, pages 5638–5648, 2019. URL: http://proceedings.mlr.press/v97/schein19a.html.
- [42] F William Townes. Review of probability distributions for modeling count data. arXiv preprint arXiv:2001.04343, 2020.
- [43] Filipp Valovich and Francesco Alda. Computational differential privacy from lattice-based cryptography. In International Conference on Number-Theoretic Methods in Cryptology, pages 121–141, 2017.
- [44] Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher A. Choquette-Choo, Peter Kairouz, H. Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang. Federated learning of gboard language models with differential privacy. In ACL: Industry Track, pages 629–639, 2023. doi:10.18653/V1/2023.ACL-INDUSTRY.60.
- [45] Mingyuan Zhou. Nonparametric bayesian negative binomial factor analysis. Bayesian Analysis, 13, April 2016. doi:10.1214/17-BA1070.
Appendix A Appendix
A.1 Deferred Proof of Lemma 12
Before we prove Lemma 12, it will be convenient to state the following simple lemma.
Lemma 31.
Let be log convex on the interval . Then for any and such that :
Proof.
From log-convexity, we have , and Multiplying the two yields the claimed inequality.
We are now ready to prove Lemma 12.
Proof of Lemma 12.
Assume w.l.o.g. that . There are three cases to consider.
-
1.
and . We have .
-
2.
and . We have
-
3.
and . By decreasing on , we have . Applying Lemma 31 concludes the proof.
A.2 Alternative simplified GDL privacy bound
In this section we will outline a tighter version of Corollary 14, which only well-approximates the privacy loss in the small regime. This bound well-approximates the privacy loss in all regimes, at the cost of some added complexity in the expression.
Corollary 32 (to Theorem 13).
For any , the GDL-noise addition mechanism is -DP for sensitivity where
Proof.
The proof is identical to Corollary 14, except in the last step where we use the following Wendel’s double inequality [40, eq. 2.6]:
A.3 Analytical variance of the discrete staircase distribution
In this section we derive the analytical variance of the discrete staircase using the Mathematica software [29], for the purposes of generating Figure 1.
Definition 33 ([18]).
The discrete staircase distribution with parameters , , and is defined as
where , , and .
Lemma 34.
Let , then
Where
Proof.
Let . We will first compute the variance from just the central “stair”.
Finally we compute the variance from the full support of the distribution.
The result follows from symbolic simplification in Mathematica, and replacing terms with for ease of presentation.
We also derive the following concrete bound for large , which facilitates a comparison with our MSDLap noise.
Observation 35.
For any and , the variance of the discrete staircase distribution (for any value of ) is at least
Proof.
First, notice that, if , then . Since the noise is zero-mean, the variance is thus at least which is at least by our condition on .
Next, consider the case . In this case, the variance is exactly
where is as defined in Definition 33.
A.4 Arete convergence
In this section we show a link between the GDL distribution and the Arete distribution from [39].
Definition 36.
For , let denote the distribution of where and .
The main technical lemma linking the and follows from showing how the negative binomial distribution converges to the gamma distribution.
Lemma 37.
Let and . Then as .
Proof.
By Lévy’s continuity theorem, convergence in distribution follows from pointwise convergence of the characteristic function. Denote the characteristic function of the distribution . We first note the following facts:
Finally,
L’Hôpital | |||
Proposition 38.
Let . Then as .
Proof.
This follows from Definition 9, Definition 36, and Lemma 37.
Remark 39.
The convergence result in Proposition 38 shows that the Arete mechanism is quite similar to the GDL mechanism transformed with real support via the approach in Theorem 21. The primary difference is how large gets. Using Theorem 21, we only set , allowing the resulting (discrete) distribution to have “holes” in its support. The purpose of the Laplace noise in that case is to smooth out the holes and ensure support on . On the other hand the Arete (via Proposition 38) requires (with no holes in its support), and the purpose of the Laplace noise is to smooth out the resulting singularity at 0 for sufficiently small values of .131313See [39, Page 3] for further discussion on this point. As such, the proof technique for Theorem 21 cannot be immediately used to help prove (or disprove) Conjecture 30.
A.5 Parameterized Difference Set and The multi-scale discrete Laplace Mechanism
Recall that, for privacy analysis, we usually only consider the sensitivity of the function , which is defined as where the maximum is over all pairs and differing on one entry. In this section, we show that, if we parameterized the potential different values in a more fine-grained manner, we can achieve an improved error in certain cases.
For a given query function , we define the difference set of , denoted by , as the set of all possible values of for all pairs and differing on one entry. If is finite, we let the -multi-scale discrete Laplace Mechanism to be the mechanism that outputs where for all .
Theorem 40.
For any query function such that is finite, the -multi-scale discrete Laplace Mechanism is -DP. Furthermore, for , its MSE is .
Proof.
(Privacy) To show that this mechanism is -DP, it suffices to show that
for any pair that differs on one entry. Consider any such fixed pair of . Let ; due to symmetry of the noise around zero, we may assume that . If , the statement is clearly true. Otherwise, from definition of , we have .
(Accuracy) The MSE of the mechanism is
To see the advantage of the above mechanism, we note a few scenarios where this mechanism has MSE , but where approaches which consider the sensitivity alone must have MSE at least [18, 17, 19], which is asymptotically larger. In each example we consider a query with sensitivity .
-
, i.e. there is one large possible difference, but possibly many small ones far from .
-
for fixed i.e. differences are structured to form exponential buckets.
Finally, we note that the setting where is small (even when is large) can occur in practice. As an example, imagine a simple merchant that sells items whose prices are all in the set and they want to privately sum a database of sales where each row is the sale price of a single item. If a neighboring dataset adds or removes a row, it is clear the sensitivity of this query is . For , the continuous staircase will have MSE141414MSE results rounded to two significant figures. 8.5, but the MSDLap mechanism with will have MSE 1.0, resulting in nearly an order of magnitude improvement.