Abstract 1 Introduction 2 Preliminaries 3 Discrete mechanisms 4 From Discrete to Continuous 5 Efficiently and exactly sampling from the distributed MSDLap 6 Order optimal MSE in the multi-message shuffle model 7 Conclusion References Appendix A Appendix

Infinitely Divisible Noise for Differential Privacy:
Nearly Optimal Error in the High ε Regime

Charlie Harrison ORCID Google, Austin, TX, USA Pasin Manurangsi ORCID Google Research, Bangkok, Thailand
Abstract

Differential privacy (DP) can be achieved in a distributed manner, where multiple parties add independent noise such that their sum protects the overall dataset with DP. A common technique here is for each party to sample their noise from the decomposition of an infinitely divisible distribution. We analyze two mechanisms in this setting: 1) the generalized discrete Laplace (GDL) mechanism, whose distribution (which is closed under summation) follows from differences of i.i.d. negative binomial shares, and 2) the multi-scale discrete Laplace (MSDLap) mechanism, a novel mechanism following the sum of multiple i.i.d. discrete Laplace shares at different scales. For ε1, our mechanisms can be parameterized to have O(Δ3eε) and O(min(Δ3eε,Δ2e2ε/3)) MSE, respectively, where Δ denote the sensitivity; the latter bound matches known optimality results. Furthermore, the MSDLap mechanism has the optimal MSE including constants as ε. We also show a transformation from the discrete setting to the continuous setting, which allows us to transform both mechanisms to the continuous setting and thereby achieve the optimal O(Δ2e2ε/3) MSE. To our knowledge, these are the first infinitely divisible additive noise mechanisms that achieve order-optimal MSE under pure DP for either the discrete or continuous setting, so our work shows formally there is no separation in utility when query-independent noise adding mechanisms are restricted to infinitely divisible noise. For the continuous setting, our result improves upon Pagh and Stausholm’s Arete distribution which gives an MSE of O(Δ2eε/4)  [39]. Furthermore, we give an exact sampler tuned to efficiently implement the MSDLap mechanism, and we apply our results to improve a state of the art multi-message shuffle DP protocol from [7] in the high ε regime.

Keywords and phrases:
Differential Privacy, Distributed Noise Addition
Copyright and License:
[Uncaptioned image] © Charlie Harrison and Pasin Manurangsi; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Distribution functions
; Security and privacy
Related Version:
Full Version: https://arxiv.org/abs/2504.05202
Acknowledgements:
Thanks to Peter Kairouz for useful discussions and pointing out the prior work, especially [6]. This work was supported by Google.
Editors:
Mark Bun

1 Introduction

Differential Privacy (DP) [15] is a formal notion of privacy which bounds the sensitive information revealed by an algorithm. While there are many “flavors” of DP, most relevant to this work is so-called pure-DP or ε-DP which bounds the privacy loss via the parameter ε.

Definition 1 ([15]).

A randomized mechanism M:𝒳d𝒴 is ε-DP if, for all x,x𝒳d differing111For this work, we may consider any neighboring notion. We use the substitution notion for simplicity. in a single entry, Pr[M(x)S]eεPr[M(x)S] for all measurable S𝒴.

We focus on the so-called low-privacy regime where ε1. Despite its name, this regime still provides meaningful privacy protection and is the setting most often employed in practical applications of DP (e.g. [1, 4, 44]). The bounds we state throughout will focus on this regime.

A challenge in deploying DP is doing so while also producing useful results. In this work, we focus on the mean squared error (MSE) of a query q subject to a query-independent additive noise mechanism, i.e. M(x)=q(x)+Z where Z is a random variable. There is a rich body of work on optimizing MSE in this setting. Notably, the staircase mechanism ([18, 17, 19]) was shown to have the optimal MSE of all ε-DP, query-independent additive noise mechanisms. In the continuous setting, it achieves a MSE of O(Δ2e2ε/3). In the discrete setting, its MSE interpolates between O(Δ3eε) and O(Δ2e2ε/3). For Δ=1, the optimal discrete staircase mechanism is the discrete Laplace (aka Geometric) mechanism [25].

A probability distribution 𝒟 is infinitely divisible iff for every positive integer n, there exists a distribution 𝒟/n such that, when we sample n i.i.d. random variables Z1,,Zn𝒟/n, their sum Z=i=1nZi is distributed as 𝒟. In distributed DP, a common technique (see [26] for an overview) is for n parties to each sample Zi such that the sum is distributed according to 𝒟, which can be shown to protect the dataset with DP. The infinite divisibility property of 𝒟 allows for distributed protocols where an arbitrary n1 number of parties can participate. Under the more restrictive setting where the additive noise mechanism M must sample the noise Z from an infinitely divisible distribution, there was previously no known mechanism in either the discrete or continuous settings which matched the MSE of the staircase mechanism. We resolve this gap in this paper for both settings.

1.1 Related work

Distributed noise generation for differential privacy is well studied even for distributions that are not infinitely divisible. In fact, the idea dates back to the very early days of DP [14]. Moreover, several works have studied the setting where Z1,,Zn samples from some distribution 𝒟~ and directly argue about the distribution of their summation Z=Z1++Zn. Examples include the case where 𝒟~ is a Bernoulli distribution [13, 20], for which Z is a Binomial random variable, and the case where 𝒟~ is a discrete Gaussian distribution [33], for which Z is “close” to discrete Gaussian random variable. The drawback here is that the distribution of Z are different for different values of n, meaning that the privacy analysis often requires n to be sufficiently large (e.g. [20]) or sufficiently small (e.g. [33]). Using infinitely divisible distribution overcomes this issue since the distribution of the total noise Z is always 𝒟 regardless of the value of n, leading to a privacy analysis that works for all regimes of n. Due to this, infinitely divisible noise distributions have gained popularity in distributed settings of differential privacy (e.g. [26, 7, 22, 23, 2, 12, 6, 21]).

As discrete distributions are typically easier to embed in multi-party cryptographic protocols and avoid implementation issues [36] with floating point representations, they tend to be more well-studied in distributed DP. The infinite divisibility of the discrete Laplace into negative binomial222Throughout this work, we use the term negative binomial to refer to the distribution generalized to a real valued stopping parameter r. In other works, this is sometimes called the Pólya distribution. shares has been studied in [26, 7, 6]. In [6] the authors explicitly analyzed privacy in the face of dropouts, or parties that fail to properly add their noise share.

In the continuous setting, the infinitely divisible Arete distribution was introduced in [39] specifically to target the low-privacy, high ε pure-DP regime. It was designed to match the performance of the (continuous) staircase mechanism which is not infinitely divisible and therefore unusable in the distributed setting. While the continuous staircase mechanism achieves O(Δ2e2ε/3) MSE, the authors only proved the Arete mechanism has an MSE of O(Δ2eε/4), though we believe this is not tight (Conjecture 30).

Distributed noise is relevant in other notions of differential privacy as well. The aforementioned discrete Gaussian mechanism [11, 33] satisfies zero-concentrated DP [10], and the Skellam mechanism [2, 43, 41] satisfies Rényi DP [37].333Mechanisms satisfying Rényi DP or zero-concentrated DP naturally also satisfy approximate DP. Both mechanisms were proposed in the context of federated learning via a secure aggregation multi-party protocol [34, 9, 8].

The GDL mechanism was explored informally in a prior blog post by the first author [27]. Concurrent to this work, the GDL distribution was studied in the context of (ε,δ) shuffle DP in [5]. There the authors analyzed a shifted and truncated GDL distribution that achieves an approximate DP bound guarantee, vs. the pure DP one in this work which does not perform any truncation. Their primary result in the shuffle setting involving nearly matching the utility of the central discrete Laplace mechanism. On the other hand, as explained below, we improve on the central discrete Laplace mechanism for sufficiently large ε.

1.2 Our contributions

Table 1: Summary of our results (bold) compared to known noise distributions satisfying ε-DP. MSEs exclude constant factors.

Discrete Distributions
Distribution MSE Inf. Div. Reference Discrete Laplace eε/Δ [25] Discrete Staircase min{Δ3eε,Δ2e2ε/3} [18, 17, 19] Generalized Discrete Laplace (GDL) Δ3eε for ε>2+log(Δ) Theorem 15 Multi-Scale Discrete Laplace (MSDLap) min{Δ3eε,Δ2e2ε/3} Corollary 19

Continuous Distributions

Distribution MSE Inf. Div. Reference
Laplace Δ2/ε2 [15]
Staircase Δ2e2ε/3 [18, 17, 19]
Arete Δ2eε/4 [39]
Continuous MSDLap Δ2e2ε/3 Theorem 21

In Section 3, we introduce the GDL and MSDLap mechanisms, two discrete noise-adding mechanisms having optimal O(Δ3eε) MSE for fixed Δ and any sufficiently large ε. Inspired by the discrete staircase mechanism, we also extend the MSDLap mechanism with a parameter optionally allowing it to satisfy O(Δ2e2ε/3) MSE, allowing it to achieve asymptotically-optimal error for any fixed Δ and ε1. Notably, the MSDLap mechanism matches the MSE of the discrete staircase including constants as ε.

The GDL mechanism, as the difference of two i.i.d. negative binomial noise shares, generalizes the distributed discrete Laplace mechanisms in [26, 7, 6] in the face of unexpected dropouts when a larger fraction of parties than expected fail to add their noise shares, providing a “smooth” closed-form privacy guarantee.

In Section 4, we introduce a method to transform a discrete infinitely divisible additive mechanism into a continuous one up to a small loss in parameters. This allows us to achieve the asymptotically-optimal O(Δ2e2ε/3) MSE by transforming either the GDL or the MSDlap mechanisms, improving on the Arete mechanism’s bound of O(Δ2eε/4) in [39].

Our noise distributions and previously known distributions are summarized in Table 1.

While its utility exceeds the GDL’s, the MSDLap mechanism naively requires sampling from O(Δ) independent negative binomial random variables. In Section 5 we outline an improved exact sampling algorithm which runs in only O(1) steps in expectation for relevant regimes. This algorithm may be of independent interest for general purpose multivariate negative binomial sampling in the sparse regime where most samples are 0.

Finally, in Section 6, we improve the multi-message “split and mix” real summation shuffle protocol of [7] with our results, attaining the O(e2ε/3) bound on MSE where previous results could only achieve O(1/ε2) from the discrete Laplace.

2 Preliminaries

For any n, we write [n] as a shorthand for {1,,n}.

Function identities.

We introduce a few functions and identities used in our proofs. Let Γ(z)=0tz1etdt be the gamma function. We denote the rising factorial (aka the Pochhammer symbol) by (x)n:=(x)(x+1)(x+n1)=Γ(x+n)Γ(x). Denote the hypergeometric function by F12[a,b;c;z] [38], where

F12[a,b;c;z]=s=0(a)s(b)s(c)ss!zs=Γ(c)Γ(a)Γ(b)s=0Γ(a+s)Γ(b+s)Γ(c+s)s!zs. (1)

Let ψ(z) denote the digamma function, where ψ(z)=ddzlog(Γ(z))=Γ(z)Γ(z). The following observation is well-known (see e.g. [3]):

Observation 2.

ψ is increasing and ψ is decreasing on (0,).

Finally, we state the so-called “hockey-stick identity” (e.g. [32]):

Lemma 3.

For any non-negative integers ,m, j=m(j)=(m+1+1)

Distributions.

For convenience, we write a random variable (r.v.) and its distribution interchangeably.

For a discrete distribution 𝒟 with support on 𝒳, denote its probability mass function (PMF) as f𝒟(k) for k𝒳. When we say that a discrete distribution is infinitely divisible, we assume implicitly that 𝒟/n are also discrete. Relevant to this work are the following well-known discrete distributions (where a,r>0,p(0,1)):

  • The negative binomial (NB) distribution, denoted NB(r,p) and with support on 0, has PMF fNB(r,p)(k)=(1p)kprΓ(k+r)Γ(r)Γ(k+1). It is infinitely divisible, as i=1nNB(r/n,p)NB(r,p). Its variance is Var(NB(r,p))=(1p)rp2. For r, the negative binomial distribution models the number of failures before the first r successes in a sequence of i.i.d. Bernoulli trials with success probability p.

  • The geometric distribution, denoted Geo(p) is a special case of NB with r=1.

  • The discrete Laplace distribution, denoted DLap(a) and with support on , has PMF fDLap(a)(k)=tanh(a/2)ea|k|. Since DLap(a)NB(1,1ea)NB(1,1ea), it inherits infinite divisibility from NB. Its variance is Var(DLap(a))=1cosh(a)1.

  • The Bernoulli distribution, denoted Ber(p), has PMF fBer(p)(k)={pk=11pk=0.

For a continuous distribution 𝒟 on 𝒳, let f𝒟(x) for x𝒳 be its probability density function (PDF) at x. Recall the following continuous distributions (where k,θ,b>0):

  • The gamma distribution, denoted Γ(k,θ), with support on + has PDF fΓ(k,θ)(x)=1Γ(k)θkxk1ex/θ. It is infinitely divisible as i=1nΓ(k/n,θ)Γ(k,θ).

  • The Laplace distribution, denoted Lap(b), with support on has PDF fLap(b)(x)=12be|x|/b. Its variance is 2b2. It is infinitely divisible as i=1n(Γ(1/n,b)Γ(1/n,b))Lap(b).

We will also use the following simple observations (where Z,Z1,Z2 are r.v.s):

Observation 4.

If Z is infinitely divisible, cZ is infinitely divisible for any constant c.

Observation 5.

If Z1,Z2 are infinitely divisible and independent, Z1+Z2 is infinitely divisible.

We say that a distribution 𝒟 is closed under summation if 𝒟 is infinitely divisible and additionally, 𝒟/n follows the same distribution family as 𝒟 for all n. This additional property provides benefits in the distributed setting as it ensures the mechanism’s privacy is well-understood even as parties drop out or join the protocol.

Max Divergence and DP.

Let D(PQ) denote the max divergence between two distributions P,Q, i.e., supxsupp(P)fP(x)fQ(x). We state the following well-known properties.

Lemma 6 (Post-Processing).

For any (possibly randomized) function f and any random variables U,V, we have D(f(U)f(V))D(UV).

Lemma 7 (Triangle Inequality).

For any distributions P,Q,R, D(PQ)D(PR)+D(RQ).

For a query function q:Xd𝒴, we let Δ(q)=maxx,x|q(x)q(x)| where the maximum is over all pairs x and x differing on one entry. The 𝒟-noise addition mechanism for a query function q is the mechanism M(x) that outputs q(x)+Z where Z is drawn from 𝒟. For a discrete (resp. continuous) distribution 𝒟 and Δ (resp. Δ>0), we say the 𝒟-noise addition mechanism is ε-DP for sensitivity Δ iff the 𝒟 noise addition mechanism is ε-DP for all queries q:𝒳d (resp. q:𝒳d) such that Δ(q)Δ. It follows from the definition of DP and max divergence that this condition translates to the following (see e.g. [18]):

Lemma 8.

For a discrete (resp. continuous) distribution 𝒟, the 𝒟-noise addition mechanism is ε-DP for sensitivity Δ iff D(𝒟+ξ𝒟)ε for all ξ{Δ,(Δ1),,Δ} (resp. if444The other direction of the implication for the continuous case does not hold since e.g. the set of x where f𝒟+ξ(x)>eεf𝒟(x) might have measure zero. D(𝒟+ξ𝒟)ε for all ξ[Δ,Δ]).

3 Discrete mechanisms

We present two infinitely divisible discrete noises with order-optimal MSEs.

3.1 The generalized discrete Laplace mechanism

This section introduces the generalized discrete Laplace distribution and associated mechanism. We start by the description of the distribution and its PMF:

Definition 9 ([35]).

For β,a>0, let GDL(β,a) denote the distribution of Z1Z2 where Z1,Z2i.i.d.NB(β,1ea). For all x, the PMF of GDL(β,a) at x, i.e. fGDL(β,a)(x), is

ea|x|(1ea)2βF12[β,β+|x|;1+|x|;e2a]Γ(β+|x|)Γ(1+|x|)Γ(β). (2)

The discrete Laplace distribution is a special case of the GDL with β=1.

The infinite divisibility of NB immediately implies that GDL is also infinitely divisible:

Observation 10.

GDL is infinitely divisible and closed under summation. In particular, for independent X1GDL(β1,a),,XnGDL(βn,a), we have i=1nXiGDL(i=1nβi,a).

It will be convenient to consider the extension of the PMF to all real numbers; let fGDL(β,a): be the function defined by (2). We start with the following lemma.

Lemma 11.

For β(0,1), fGDL(β,a) is decreasing and log convex on [0,).

Proof.

Because the product of log convex and decreasing functions is log convex and decreasing, and eax is both log convex and decreasing, we focus on remaining relevant term.

F12[β,β+|x|;1+|x|;e2a]Γ(β+|x|)Γ(1+|x|)Γ(β) =s=0Γ(β+s)Γ(β+x+s)Γ(β)2Γ(1+x+s)s!e2as
=s=0g(x)Γ(β+s)e2asΓ(β)2s!,

where g(x)=Γ(β+x+s)Γ(1+x+s). First we show that g(x)=g(x)(ψ(x+s+β)ψ(x+s+1))<0 and that d2dx2logg(x)=ψ(β+x+s)ψ(1+x+s)0. The derivatives follow from the definition of ψ, and the inequalities follow from Observation 2. Finally, the result follows as the sum of decreasing and log convex functions is decreasing and log convex.

We also need the following technical lemma which, together with Lemma 11, allows us to consider only f(0)/f(Δ). Its proof is deferred to Section A.1.

Lemma 12.

Let f: be symmetric about 0, and decreasing and log convex on [0,). Then for any x,x such that |xx|Δ, we have f(x)f(x)f(0)f(Δ).

We are now ready to state the privacy guarantee of the GDL-noise addition mechanism.

Theorem 13.

For any Δ,β,a>0, the GDL(β,a)-noise addition mechanism is ε-DP for sensitivity Δ iff

ε{aΔ+logF12[β,β;1;e2a]F12[β,β+Δ;1+Δ;e2a]Γ(Δ+1)Γ(β)Γ(β+Δ)0<β<1aΔβ1
Proof.

For the 0<β<1 case, by Lemmas 8 and 12 along with the fact that fGDL(β,a) is log-convex on [0,) and symmetric about 0, the GDL(β,a)-noise addition mechanism is ε-DP for sensitivity Δ iff εmaxξ{Δ,,Δ}maxxfGDL(β,a)(xξ)fGDL(β,a)(x)=Lemma 12logfGDL(β,a)(0)fGDL(β,a)(Δ), which is exactly the claimed bound.

For β1, we decompose the mechanism by observing (see Observation 10) that ZGDL(β,a) can be sampled as Z=Z1+Z2 where Z1DLap(a),Z2GDL(β1,a) are independent. By post-processing (Lemma 6), the GDL(β,a)-noise addition mechanism is at least as private as the DLap(a)-noise addition mechanism, which is aΔ-DP for sensitivity Δ.

For the tightness claim, first note that Equation 1 yields

log(fGDL(β,a)(k)fGDL(β,a)(k+Δ))=aΔ+log(s=0(β)s(β+k)se2as(1+k)ss!s=0(β)s(β+k+Δ)se2as(1+k+Δ)ss!Γ(β+k)Γ(1+k+Δ)Γ(1+k)Γ(β+k+Δ)).

Observe that (β)s(β+k)se2as(1+k)ss!(β)s(β+k+Δ)se2as(1+k+Δ)ss!=i=0s1β+k+i1+k+iβ+k+Δ+i1+k+Δ+i1, because every term in the product is at least one. Plugging this into the above, we get

log(fGDL(β,a)(k)fGDL(β,a)(k+Δ))aΔ+log(Γ(β+k)Γ(1+k+Δ)Γ(1+k)Γ(β+k+Δ))=aΔ+log((1+k)Δ(β+k)Δ).

Thus, limklog(fGDL(β,a)(k)fGDL(β,a)(k+Δ))aΔ, meaning that our bound is tight.

The exact privacy bound for GDL above (for 0<β<1) is fairly unwieldy. We show a simplified bound below, as well as a tighter, slightly more complex bound in Section A.2.

Corollary 14.

For any ΔN,a>0,β(0,1), the GDL(β,a)-noise addition mechanism is ε-DP for sensitivity Δ where εaΔ+logΔβ.

Proof.

Since (β)ss!(β+Δ)s(1+Δ)s, we can bound the ratio of the hypergeometric functions by

F12[β,β;1;e2a]F12[β,β+Δ;1+Δ;e2a]=s=0(β)s(β)ss!s!e2ass=0(β)s(β+Δ)s(1+Δ)ss!e2as1.

Thus, from Theorem 13, the mechanism is ε-DP for all εaΔ+logΓ(Δ+1)Γ(β)Γ(β+Δ). Observe also that Γ(Δ+1)Γ(β)Γ(β+Δ)=Δβ(Δ1)!(β+1)Δ1Δβ. Combining these yields the desired claim.

Finally, we prove our accuracy guarantee for the GDL mechanism in the high ε regime.

Theorem 15.

For any Δ and ε>2+logΔ, the GDL(Δe2ε,2Δ)-noise addition mechanism is ε-DP for sensitivity Δ and has MSE O(Δ3eε).

Proof.

Since ε>2+logΔ, we have β=Δe2ε<1. Thus, Corollary 14 implies the privacy guarantee. Finally, the MSE is 2Var(NB(β,1ea))=Δe2εcosh(2/Δ)1=O(Δ3eε).

3.2 The multi-scale discrete Laplace mechanism

The (ε,Δ)-multi-scale discrete Laplace ((ε,Δ)-MSDLap) distribution with parameter ε>0,Δ is defined as the distribution of i=1ΔiXi where X1,,XΔi.i.d.DLap(ε). From Observation 4 and Observation 5, the (ε,Δ)-MSDLap distribution is infinitely divisible.

This mechanism is ε-DP, and its accuracy guarantee matches that of Theorem 15:

Theorem 16.

For any ε>0,Δ, the (ε,Δ)-MSDLap-noise addition mechanism is ε-DP for sensitivity Δ. Furthermore, for ε1, the MSE is O(Δ3eε).

Proof.

(Privacy) From Lemma 8 and the symmetry of the noise around 0, it suffices to show
D(ξ+i=1ΔiXii=1ΔiXi)ε for all ξ[Δ]. From Lemma 6, we have

D(ξ+i=1ΔiXii=1ΔiXi)D(ξ+ξXξξXξ)=D(1+XξXξ)ε,

where the last inequality follows from XiDLap(ε).

(Accuracy) MSE is Var(i=1ΔiXi)=i=1Δi2Var(Xi)=O(Δ3eε).

Theorem 16 implies that, for fixed Δ and sufficiently large ε, the (ε,Δ)-MSDLap-noise achieves asymptotically optimal MSE. In fact, below we prove a stronger statement: When ε, the ratio between MSE of MSDLap and the optimal MSE [17, 19] approaches 1.

Corollary 17.

For a fixed Δ, the ratio of the MSE of the (ε,Δ)-MSDLap-noise addition mechanism and that of the ε-DP discrete staircase mechanism for sensitivity Δ [17, 19] approaches 1 as ε.

Proof.

As we show in Observation 35 (in Section A.3), the MSE of the ε-DP discrete staircase mechanism for sensitivity Δ is at least 1eε+(2Δ1)Δ(Δ+1)(2Δ+1)3 for any sufficiently large ε. Thus, in this regime, the ratio between the two MSEs is at most Δ(Δ+1)(2Δ+1)6(cosh(ε)1)1eε+(2Δ1)Δ(Δ+1)(2Δ+1)3=1+(2Δ1)eε12eε+e2ε. The RHS approaches 1 as ε as claimed.

While the naive approach to sample from the MSDLap distribution requires sampling O(Δ) random variables, we show in Section 5 an efficient algorithm for the high ε regime.

3.2.1 Generalizing the MSDLap mechanism

We can also generalize the MSDLap mechanism to match the error in [18] for every setting of parameters Δ,ε. We state this below where r{0,,Δ} is a parameter so that it matches the “r” parameter in the discrete staircase mechanism in [18]. In our results henceforth, we assume that ε2 for simplicity; this 2 can be changed to any constant555By changing the distribution of Y in the proof of Theorem 18 to DLap(c/r) for some larger c>1. but we keep it for simplicity of the distribution description.

Theorem 18.

For any ε2,Δ and r{0,,Δ}, there is an infinitely divisible discrete noise-addition mechanism that is ε-DP for sensitivity Δ and has MSE O(r2+eεΔ3r+1).

Plugging in r=0,eε/3Δ gives the following, which will be useful later in Section 6.

Corollary 19.

For any ε2, there exists an infinitely divisible discrete noise-addition mechanism that is ε-DP for sensitivity Δ with MSE O(Δ2min{eεΔ,e2ε/3}).

We are now ready to prove Theorem 18. The rough idea is that, instead of using only the (ε,Δ)-MSDLap noise, we first add a scaled-up (ε1,Δ0)-MSDLap noise where Δ0<Δ. The scaling up leaves us with “holes” in the noise distribution. To fix this, we additionally add another DLap noise to “smoothen out the holes”. This idea is formalized below.

Proof of Theorem 18.

The case r=0 follows from Theorem 16.

For r1, let Δ0=Δ/r. Let the noise distribution 𝒟 be the distribution of Z=rX+Y where X(ε1,Δ0)MSDLap,YDLap(1/r) are independent. The infinite divisibility of 𝒟 follows from the infinite divisibility of MSDLap, DLap and Observations 4 and 5.

(Privacy) From Lemma 8, it suffices to show D(ξ+ZZ)ε for all ξ{Δ,,Δ}. Let i=ξ/r and j=ξri. Note that i[Δ0] and j[r]. From Lemmas 7 and 6,

D(ξ+ZZ)=D(ri+j+rX+YrX+Y)
D(ri+j+rX+Yri+rX+Y)+D(ri+rX+YrX+Y)
D(j+YY)+D(i+XX)1+(ε1)=ε,

where the last inequality follows from YDLap(1/r) and X(ε1,Δ0)-MSDLap (together with Theorem 16 and Lemma 8).

(Accuracy) MSE is r2Var(X)+Var(Y)O(r2Δ03eε)+O(r2)=O(r2+eεΔ3/r)

 Remark 20.

We can generalize the poof of Theorem 18 further by considering the (𝒟1,𝒟2)-generalized-multi-scale mechanism that adds noise from two distributions 𝒟1,𝒟2 where

  • 𝒟1-noise addition mechanism is ε1-DP for Δ=1, and is added at multiple scales

  • 𝒟2-noise addition mechanism is ε2-DP for Δ=r, and is used to “smoothen out the holes”

Specifically, the noise is r(i=1Δ0iXi)+Y where X1,,XΔ0i.i.d.𝒟1,Y𝒟2 are independent. The privacy proof proceeds identically to Theorem 18 yielding an (ε1+ε2)-DP mechanism. This allows us to consider 𝒟1,𝒟2GDL, which gives us the multi-scale version of the GDL mechanism; this noise distribution is additionally closed under summation.

We plot the MSE of our new mechanisms and the established baselines in Figure 1.

Figure 1: The MSE of the GDL mechanism (Theorem 15) and MSDLap mechanism (Theorem 16) with optimized r. We include the discrete Laplace and staircase (Section A.3) baselines. In the high ε regime our mechanisms closely track the MSE of the discrete staircase.
Figure 2: (a) The PMF of the GDL distribution parameterized by Theorem 15, the MSDLap distribution (Theorem 16), and the discrete Laplace distribution with a=ε/Δ. GDL has a much sharper peak around 0, before flattening out and decreasing slower than discrete Laplace. MSDLap has a “staircase” shaped distribution with sharp drops at Δ-width intervals. Its PMF appears fully dominated by the discrete Laplace’s, except at multiples of Δ. (b) The MSE of the continuous GDL (Theorem 15) and MSDLap (Theorem 16) after the continuous transformation of Theorem 21 is applied to them. We also plot the Arete [39, Lemma 3] and continuous staircase [18, eq. 51] mechanisms as baselines.

4 From Discrete to Continuous

We next show simple methods to transform discrete noises to continuous ones. The approach is similar to Theorem 18, except that we use Laplace noise to “smoothen out the holes”.

Theorem 21.

For ε2 and Δ>0, there exists a continuous infinitely divisible noise-addition mechanism that is ε-DP for sensitivity Δ with MSE O(Δ2e2ε/3).

Proof.

By scaling, we may assume w.l.o.g. that Δ=1.

Let εd=ε1,Δd=eε/3 and r=1Δd. Let 𝒟d be any infinitely divisible discrete distribution such that the 𝒟d-noise addition mechanism is εd-DP for sensitivity Δd with MSE O(Δd3eεd). (Such a distribution exists due to Theorem 16.666GDL also meets the requirements, but only for a specific regime of ε. Specifically, Theorem 15 requires εd>2+log(Δd)>2+ε/3. For εd=ε1, the continuous transformation of GDL is valid when ε>4.5.)

Let 𝒟 be the distribution of Z=rX+Y where X𝒟d,YLap(r/2) are independent. Since X,Y are infinitely divisible, Observations 4 and 5 imply that 𝒟 is infinitely divisible.

(Privacy) From Lemma 8, it suffices to show D(ξ+ZZ)ε for all ξ[1,1]. Let i be the closest integer to ξ/r and let j=ξri. Note that i{Δd,,Δd} and777 Note that the choice of i that halves the maximum value of |j| cannot be directly applied to Theorem 18. In that theorem, we take Δ0=Δ/r, while the approach here only works when Δ0Δ/r. j[r/2,r/2]. From Lemmas 7 and 6, we have

D(ξ+ZZ)=D(ri+j+rX+YrX+Y)
D(ri+j+rX+Yri+rX+Y)+D(ri+rX+YrX+Y)
D(j+YY)+D(i+XX)1+(ε1)=ε,

where the last inequality follows from YLap(r/2) and X-noise addition mechanism is ε-DP for sensitivity Δd (and Lemma 8).

(Accuracy) MSE is Var(rX+Y)=r2Var(X)+Var(Y)O(Δdeε)+O((1Δd)2)O(e2ε/3), where the last inequality is from our choice Δd=Θ(eε/3).

Similar to Remark 20, we may also consider 𝒟d apart from the MSDLap distribution, as long as it satisfies εd-DP for sensitivity Δd. In particular, we may use the GDL distribution. We plot the results of transforming the discrete mechanisms from Section 3 in Figure 2.

5 Efficiently and exactly sampling from the distributed MSDLap

This section outlines an algorithm to efficiently sample from the MSDLap mechanism described in Theorem 16 in the distributed setting over n parties. Recall that the (ε,Δ)-MSDLap noise is defined as i=1ΔiXi where X1,,XΔi.i.d.DLap(ε)=NB(1,1eε)NB(1,1eε). In other words, each of the n parties need to sample i=1Δi(UiVi) where U1,,UΔ,V1,,VΔi.i.d.NB(1/n,1eε). Thus, the naive algorithm requires each party to sample from k=2Δ negative binomial random variables. For large Δ, or for Δ=O(eε/3) as in Corollary 19, this may be computationally expensive.

Our algorithm resolves this issue, allowing us to sample from exponentially many (in ε) negative binomial random variables in time polynomial in ε.

Theorem 22.

For input k,r,γ>0, and p=eγ, Algorithm 3 returns non-zero samples from k i.i.d. samples of NB(r,1eε) and completes in O~(1+kreε) steps in expectation, where ε=log11eγ and O~ hides polynomial factors in ε.

We leverage the fact that in the high ε regime, most of these NB random variables will be 0. We re-frame the problem of sampling many negative binomials into two separate problems:

  • Sampling from the sum of many i.i.d. NBs.

  • Fairly allocating the result across each r.v.

While sampling from the sum of many NBs is simple on its face given their infinite divisibility, standard samplers for NB(r,p) (e.g. [28]) take time linear in r which is not desirable. Below we will describe an algorithm (Algorithm 1) whose (expected) running time only scales with the mean of NB(r,p), which is only O(reε) in our setting.

To fairly allocate across the random variables, we leverage the fact that the conditional distribution of the sequence of NB random variables given their sum follows the Dirichlet multinomial distribution denoted DirM(n,𝜶) where 𝜶={α1,,αk} and α0=i=1kαi. Its PMF is fDirM(n,𝜶)(x)=Γ(α0)Γ(n+1)Γ(n+α0)i=1kΓ(xi+αi)Γ(αi)Γ(xi+1).

Lemma 23 (e.g. [45, 42]).

Let X={X1,,Xk} be such that XiNB(αi,p) are independent. Let T=i=1kXi. Then the conditional distribution X|T=tDirM(t,𝛂).

Our sampler can be implemented on a finite computer in the Word RAM model avoiding any real-arithmetic operations. Following [11, Section 5], we focus on the runtime in the expected number of arithmetic operations, which take only polynomial time in the bit complexity of the parameters. We make the following assumptions on the availability of sampling primitives requiring only O(1) arithmetic operations in expectation:

  • A uniform sampler to draw D{1,2,,d} for d.

  • A Ber(n/d) sampler for n,d, which trivially follows from the uniform sampler.

  • A Geo(1eγ) sampler for γ from [11].

Finally, we assume a map data structure with O(1) accesses and updates in expectation, and a vector data structure with O(1) random access and append operations.

Algorithm 1 Fast NB sampler for p>12.

Input: r>0, p=eγ for γ>0
Output: A sample from NB(r,p)

Algorithm 2 Dirichlet multinomial sampler.

Input: n, k, α=a/b>0
Output: A sample from DirM(n,{α,,α}), encoded as a sparse map from variate index to count, with zero variates removed.

Algorithm 3 Sparse NB sampler.

Input: k,r, p=eγ for γ>0
Output: Non-zero samples X1,,Xk where XiNB(r,p)

Our Algorithm 3 is straightforward. It consists of

  1. 1.

    Sampling from NB(r,p) with Algorithm 1 to learn the sum of all the terms, handling rational values of r following the approach in [28] using a simple rejection sampler.

  2. 2.

    Sampling from the Dirichlet multinomial distribution with Algorithm 2, which uses a version of the Pólya urn process [31] modified to handle rational fractions of balls. We sparsely encode the output to avoid storing zero entries, as MSDLap sampling only requires summing non-zero random variables.

Proposition 24.

For input r,γ>0 and p=eγ, Algorithm 1 returns one sample from NB(r,1eε) and completes in O~(1+reε) arithmetic operations in expectation, where ε=log11eγ and O~ hides polynomial factors in ε.

Proof.

We begin by analyzing the case r, i.e., IntSample subroutine. Let K be the r.v. describing the output, and Xi be the ith Geo(1p) r.v. Denote Zi=1kXiNB(k,1p). Note that [K=0]=[X1r]=pr=fNB(r,p)(0). For k1,

[K=k]=[Z<rZ+Xk+1] =z=0r1x=rzfNB(k,1p)(z)fGeo(1p)(x)
=z=0r1x=rz(1p)kpz(k+z1z)(1p)px
=(1p)k+1z=0r1(k+z1k1)pr1p
=(1p)kpr(k+r1k)=fNB(r,p)(k).

Thus, the output K follows the NB(r,p) distribution as desired.

The expected number of iterations of the loop is exactly 1+𝔼[NB(r,p)]. Each iteration of the loop takes expected O~(1) time, as the geometric sampler requires arithmetic operations polynomial in the bit complexity of γ (which is O(ε)).

Next we analyze the outer loop which handles r>0 using the accept-reject approach in [28], ensuring that in each iteration, for a proposed sample W and acceptance event A:

[W=wA] =[A|W=w][W=w]=(r)w(r)wfNB(r,p)(w)=prrfNB(r,p)(w).

This implies that the output follows NB(r,p) distribution as desired, and that [A]=prr. The latter in turn implies that the number of trials follows a geometric distribution with success probability prr. Therefore the expected number of trials is O(1/prr)=O(1/p), and the result follows as O~(11eε(1+𝔼[NB(r,1eε]))=O~(1+reε).

Proposition 25.

For input n,k, and α, Algorithm 2 returns one sample from DirM(n,{α,,α}) and requires O(n) arithmetic operations in expectation.

Proof.

We first map Algorithm 2 to the Pólya urn model:

  • To initialize the urn contents, a balls are added to the urn for each of the k colors.

  • The algorithm proceeds over n steps. At each step, a ball is chosen uniformly at random from the urn. When a ball is selected, it is replaced, along with b other balls of that color.

  • For each color, the algorithm returns the count of how many times that color was chosen.

Algorithm 2 implements this, as idx maps to the idxth color cidx, and picked is the list of selected colors. Note that the return value is sparsely encoded to avoid storing zero entries, or colors that have never been picked. For purposes of the proof of correctness we assume an unrolled output of counter equal to X={x1,,xk}, where each xi counts the number of times the ci color was picked. This can be obtained with a simple post-processing step.

Let Sm be the color of the ball chosen at iteration m. The probability that Sm=c given the previous draws is: [Sm=c|S1=s1,,Sm1=sm1]=a+i=1m1bsika+b(m1)=a+bzka+b(m1), where z=i=1m1𝟙(si=c) denote the number of previous draws of color c.

Note that Sm only depends about the current state of the urn, not the order in which balls are picked. Similarly, note that the denominator of the fraction is independent of any information about z. From this we can show that the probability of seeing any one particular sequence S={s1,,sn} which has color counts X={x1,,xk} is

[S1=s1,,Sn=sn] =(i=1n1ka+b(i1))j=1kt=1xia+b(t1)
=j=1kbxi(a/b)xibn(ka/b)n=Γ(kα)Γ(n+kα)j=1kΓ(α+xi)Γ(α).

Since there are (nx1,x2,,xk)=Γ(n+1)i=1kΓ(xi+1) sequences S with color count x, we have [X=x]=Γ(kα)Γ(n+1)Γ(n+kα)j=1kΓ(α+xi)Γ(xi+1)Γ(α)=fDirM(n,{α,,α})(x), proving the correctness.

For the run time analysis, both loops in the algorithm iterate n times, and each require only constant arithmetic operations in expectation, assuming O(1) map and vector operations.

6 Order optimal MSE in the multi-message shuffle model

We next apply the noises from Section 3 to protocols in the shuffle model of DP.

First, we recall the definition of the shuffle model [13, 16]. An m-message protocol in the shuffle model consists of a randomizer :𝒳𝒴m where 𝒴 denote the set of all possible messages and an analyzer 𝒜:𝒴nm𝒪 where 𝒪 denotes the output domain. In the shuffle model, the analyst does not see the output of each randomizer directly but only the randomly shuffled messages. We write 𝒮(x1,,xn) to denote the output of randomly shuffling nm (random) messages produced by (x1),,(xn) for x1,,xn𝒳. The shuffle model required that these shuffled messages have to satisfy DP, as stated more formally below.

Definition 26 ([13, 16]).

An m-message protocol (,𝒜) is (ε,δ)-shuffle-DP if, for every x,x𝒳n differing in a single entry, Pr[𝒮(x)S]eεPr[𝒮(x)S]+δ for all S𝒴nm.

In the real summation problem, each xi is a real number in [0,1] and the goal is to compute i[n]xi. Our main result of this section can be stated as follows:

Theorem 27.

For ε2,δ(0,1/n), there is an (ε,δ)-shuffle-DP, O(ε+log(1/δ)log(n))-message protocol for real summation with MSE O(e2ε/3) where each message is O(ε+logn) bit.

Prior to this work, the best known protocol has MSE of O(1/ε2) (even for large ε[7, 24, 23]888Pagh and Stausholm  [39, Corollary 23] claim that their result implies a protocol with absolute error 1eΩ(ε)1, but, to our knowledge, this is not the case. See the discussion at the end of this section. and thus our result provides a significant improvement in the large ε regime.

To prove this result, it is convenient to define the q-summation problem: The input x1,,xn belongs to q and the goal is to compute i[n]q. Similar to before, an m-message protocol consists of a randomizer and an analyzer 𝒜. The protocol is exact if the analyzer always output the correct answer. The protocol is σ-secure if, for all x,xqn such that i[n]xi=i[n]xi, we have DTV(𝒮(x),𝒮(x))2σ. Building on the “split and mix” protocol of [30], Balle at al. [7] and Ghazi et al. [24] gave the following protocol:

Theorem 28 ([7, 24]).

For σ(0,1/2) and q, there is an σ-secure O(1+σlogn)-message protocol for q-summation in the shuffle model where each message is from q.

Balle et al. [7] presented an elegant method to translate a secure q-summation protocol to a shuffle-DP real summation protocol. Below, we provide a slight generalization and improvement999Originally, their analysis has an additional error term due to “overflow / underflow”. We observe that this is in fact unnecessary, which reduces the claimed error and also simplifies the analysis. of their result, which eventually allows us to achieve Theorem 27.

Lemma 29 (generalization of [7] Lemma 5.2).

Suppose that, for some ε>0,Δ, there is a zero-mean discrete infinitely divisible distribution 𝒟 such that the 𝒟-noise addition mechanism is ε-DP for sensitivity Δ. Furthermore, suppose that, for some q,n with q2Δn and σ(0,1/2), there exists an n-party σ-secure m-message exact q-summation protocol in the shuffle model. Then, there is an m-message (ε,(1+eε)2σ)-shuffle-DP protocol for real summation with MSE Var(𝒟)Δ2+n4Δ2.

Moreover, the message length is the same as in the q-summation protocol.

Proof.

Let Π=(Π,𝒜Π) be the σ-secure exact q-summation protocol. Our protocol 𝒫=(Π,𝒜𝒜Π) where ,𝒜 are defined as follows101010Note that the final protocol 𝒫 is done by composing the randomizer / analyzer 𝒜 with those of Π.:

  1. 1.

    :[0,1]q on input xi works by first computing a randomized encoding yi={1+Δxi w.p. ΔxiΔxiΔxi w.p. 1(ΔxiΔxi). It then outputs (yi+Zi)modq where Zi𝒟/n.

  2. 2.

    𝒜 decodes the result r by returning r={r/Δ if 0rnΔ,n if nΔ+1r2nΔ,0 otherwise.

(Privacy) The proof of privacy proceeds identically to [7, Lemma 5.2] and is omitted here.

(Accuracy) Since Π is an exact q-summation protocol, its output r is exactly equal to u~modq where u~=i[n](yi+zi)=Z+i[n]yi where Z=z1++zn is distributed as 𝒟.

Let u=i[n]xi be the true (unnoised) sum. The first step in our accuracy analysis is a claim111111This claim is indeed our improvement over the error analysis of [7, Lemma 5.2]. that |ru||u~/Δu|. To see that this is true, let us consider the following cases:

  • Case I: u~(nΔ,2nΔ). In this case, |u~/Δu|n|ru|.

  • Case II: u~[0,nΔ]. We have r=u~/Δ and thus the inequality holds as an equality.

  • Case III: u~(nΔ,0). We set r=0. Thus, |u~/Δu|=uu~/Δur=|ru|

  • Case IV: u~(nΔ,2nΔ). We set r=n. Thus, |u~/Δu|=u~/Δuru=|ru|.

Thus, in all cases, we have |ru||u~/Δu|. Therefore, the MSE is at most

𝔼[(u~/Δu)2] =𝔼[(Z/Δ)2]+i[n]𝔼[(yi/Δxi)2]Var(𝒟)Δ2+n4Δ2.

We can now prove Theorem 27 by plugging our noise to the above lemma.

Proof of Theorem 27.

Let Δ=eε/3n,q=2nΔ and σ=log2(eε+1δ). From Corollary 19, there is a zero-mean discrete infinitely divisible distribution 𝒟 such that 𝒟-noise addition mechanism is ε-DP for sensitivity Δ where Var(𝒟)O(Δ2e2ε/3). Furthermore, Theorem 28 ensures that there exists an n-party σ-secure m-message exact q-summation protocol where m=O(1+σlogn)=O(1+ε+log(1/δ)logn). Thus, Lemma 29 implies that there is an (ε,δ)-shuffle-DP m-message protocol for real summation where each message is of length O(logq)=O(ε+logn) bits and MSE Var(𝒟)Δ2+n4Δ2O(e2ε/3).

In the above proof, we use the noise from Corollary 19, which is a modified version of MSDLap. We note that the continuous noises in Section 4 or [39] cannot be used here, as the protocol must round each contribution to a finite group q prior to summing. Neither the privacy nor the infinite divisibility of the resulting sum distribution is clear in this case.

7 Conclusion

This work closes the utility gap for infinitely divisible mechanisms in the high ε pure DP regime. We find no separation in either the discrete or continuous settings by restricting the private mechanism to infinitely divisible noise addition. The “staircase-like” MSE of both GDL and MSDLap in the low-privacy regime make them a natural replacement for the staircase mechanism in the discrete distributed setting, and we hope the results introduced here can be of practical value. We show one such practical application by extending the [30] “split and mix” protocol under shuffle DP, resolving the open question posed in [23].

Beyond improving utility, we believe GDL is of independent interest for distributed private mechanism design due to the fact that it is closed under summation. This makes it well-suited for cases where “smooth” privacy guarantees are needed for multiple outcomes, or for a single deployed system (like a secure aggregation MPC protocol over n clients e.g. [9]) where different people can make different assumptions about what the honest fraction of the involved clients are. Additionally, this property is useful in post-hoc privacy loss analysis when honesty assumptions are broken in a production system, and where otherwise an analyst must resort to numerical approximation of realized privacy loss.

In the continuous setting, our continuous transformations of all of the mechanisms in Section 3 outperform the existing Arete mechanism (Figure 2). We also note a strong resemblance between the Arete and the GDL distributions, as the NB distribution converges to the gamma distribution under certain conditions. See Section A.4 for more detail.

Finally, we hope our optimized MSDLap sampler Algorithm 3 or its constituent parts can be useful in any context where sparse, exact multivariate NB random generation is needed, or even where a general NB(r,p) sampler needs to be sublinear in r (Algorithm 1). For multivariate sampling when p is very close to 1, our approach should be a large improvement over standard methods. We note that Algorithm 2 can be extended in a straightforward way (due to Lemma 23) to support NB variates with different r values. The only change to the algorithm is initializing the urn with varying numbers of balls.

Open questions. In preparing this paper, we studied the Arete mechanism [39] extensively and are convinced it can also achieve the order optimal MSE of O(Δ2e2ε/3), but the formal proof eluded us. We present the following conjecture121212Even if this conjecture were proven, the constant factors for the Arete’s MSE still underperform the continuous multi-scale discrete Laplace. as an open question:

Conjecture 30.

Let Δ1, ε>10. The Arete(α,θ,λ)-noise addition mechanism with α=e2ε/3, θ=(1+t)1log2, and λ=(1+t)eε/3 where t=o(1) is ε-DP for sensitivity Δ and has MSE O(Δ2e2ε/3).

Our work also raises the following questions:

  • How close to the optimal MSE including constants can be achieved in the for continuous and infinite divisible mechanisms? While we match the constants of the discrete staircase for large ε, there is still a sizable gap in the continuous case.

  • Can the number of messages required for our shuffle-DP protocol in Theorem 27 be improved? Recall that we use the approach of [7] to translate a secure summation protocol to a shuffle-DP one. In fact, there is a more direct approach by [23] that achieves a smaller number of messages. Since their proof also uses infinitely divisible noises, it is plausible that our noise distribution can be used there. However, their proof is considerably more involved compared to [7] and does not use the noise distributions in a black-box manner.

References

  • [1] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
  • [2] Naman Agarwal, Peter Kairouz, and Ziyu Liu. The skellam mechanism for differentially private federated learning. NeurIPS, pages 5052–5064, 2021.
  • [3] Horst Alzer and Graham Jameson. A harmonic mean inequality for the digamma function and related results. Rendiconti del Seminario Matematico della Università di Padova, 137:203–209, 2017.
  • [4] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
  • [5] Andreas Athanasiou, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. Enhancing metric privacy with a shuffler. In PETS 2025 - 25th Privacy Enhancing Technologies Symposium, 2025.
  • [6] Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, and Marco Gruteser. Towards sparse federated analytics: Location heatmaps under distributed differential privacy with secure aggregation. Proceedings on Privacy Enhancing Technologies, 4:162–182, 2022. doi:10.56553/POPETS-2022-0104.
  • [7] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. In CCS, pages 657–676, 2020. doi:10.1145/3372297.3417242.
  • [8] James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. Secure single-server aggregation with (poly)logarithmic overhead. In CCS, pages 1253–1269, 2020. doi:10.1145/3372297.3417885.
  • [9] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In CCS, pages 1175–1191, New York, NY, USA, 2017. doi:10.1145/3133956.3133982.
  • [10] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, pages 635–658, 2016. doi:10.1007/978-3-662-53641-4_24.
  • [11] Clément L Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. NeurIPS, pages 15676–15688, 2020.
  • [12] Clément L. Canonne and Hongyi Lyu. Uniformity testing in the shuffle model: Simpler, better, faster. In SOSA, pages 182–202, 2022. doi:10.1137/1.9781611977066.13.
  • [13] Albert Cheu, Adam D. Smith, Jonathan R. Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT, pages 375–403, 2019. doi:10.1007/978-3-030-17653-2_13.
  • [14] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006. doi:10.1007/11761679_29.
  • [15] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006. doi:10.1007/11681878_14.
  • [16] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019. doi:10.1137/1.9781611975482.151.
  • [17] Quan Geng, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The staircase mechanism in differential privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, 2015. doi:10.1109/JSTSP.2015.2425831.
  • [18] Quan Geng and Pramod Viswanath. The optimal mechanism in differential privacy. In ISIT, pages 2371–2375, 2014. doi:10.1109/ISIT.2014.6875258.
  • [19] Quan Geng and Pramod Viswanath. The optimal noise-adding mechanism in differential privacy. IEEE Transactions on Information Theory, 62(2):925–951, 2016. doi:10.1109/TIT.2015.2504967.
  • [20] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In EUROCRYPT, pages 463–488, 2021. doi:10.1007/978-3-030-77883-5_16.
  • [21] Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. Pure-dp aggregation in the shuffle model: Error-optimal and communication-efficient. In ITC, pages 4:1–4:13, 2024. doi:10.4230/LIPICS.ITC.2024.4.
  • [22] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, pages 3505–3514, 2020. URL: http://proceedings.mlr.press/v119/ghazi20a.html.
  • [23] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Amer Sinha. Differentially private aggregation in the shuffle model: Almost central accuracy in almost a single message. In ICML, pages 3692–3701, 2021. URL: https://proceedings.mlr.press/v139/ghazi21a.html.
  • [24] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In Anne Canteaut and Yuval Ishai, editors, EUROCRYPT, pages 798–827, 2020. doi:10.1007/978-3-030-45724-2_27.
  • [25] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility-maximizing privacy mechanisms. SIAM J. Comput., 41(6):1673–1693, 2012. doi:10.1137/09076828X.
  • [26] Slawomir Goryczka and Li Xiong. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE transactions on dependable and secure computing, 14(5):463–477, 2015.
  • [27] Charlie Harrison. Distributed noise addition and infinite divisibility. https://charlieharrison.xyz/2024/08/05/distributed-noise.html, 2024. Accessed: 2025-02-11.
  • [28] Creighton Heaukulani and Daniel M Roy. Black-box constructions for exchangeable sequences of random multisets. arXiv preprint arXiv:1908.06349, 2019.
  • [29] Wolfram Research, Inc. Mathematica, Version 12.0, 2019. Champaign, IL, 2019.
  • [30] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248, USA, 2006. doi:10.1109/FOCS.2006.25.
  • [31] Norman L. Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Discrete Multivariate Distributions, chapter 40, pages 200–203. Wiley, 1997.
  • [32] Charles H Jones. Generalized hockey stick identities and n-dimensional blockwalking. The Fibonacci Quarterly, 34(3):280–288, 1996.
  • [33] Peter Kairouz, Ziyu Liu, and Thomas Steinke. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In ICML, pages 5201–5212, 2021. URL: http://proceedings.mlr.press/v139/kairouz21a.html.
  • [34] Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. In ICML, pages 5213–5225, 2021. URL: http://proceedings.mlr.press/v139/kairouz21b.html.
  • [35] Seetha Lekshmi V and Simi Sebastian. A skewed generalized discrete laplace distribution. IJMSI, 2:95–102, 2014.
  • [36] Ilya Mironov. On significance of the least significant bits for differential privacy. In CCS, pages 650–661, 2012. doi:10.1145/2382196.2382264.
  • [37] Ilya Mironov. Rényi Differential Privacy. In CSF, pages 263–275, 2017. doi:10.1109/CSF.2017.11.
  • [38] F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. V. Saunders B. R. Mille and, H. S. Cohl, and eds. M. A. McClain. NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/15.2, Release 1.2.2 of 2024-09-15, 2024. URL: https://dlmf.nist.gov/15.2.
  • [39] Rasmus Pagh and Nina Mesing Stausholm. Infinitely divisible noise in the low privacy regime. In ALT, pages 881–909, 2022. URL: https://proceedings.mlr.press/v167/pagh22a.html.
  • [40] Feng Qi. Bounds for the ratio of two gamma functions. Journal of Inequalities and Applications, 2010:1–84, 2010.
  • [41] Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, and Hanna Wallach. Locally private bayesian inference for count models. In ICML, pages 5638–5648, 2019. URL: http://proceedings.mlr.press/v97/schein19a.html.
  • [42] F William Townes. Review of probability distributions for modeling count data. arXiv preprint arXiv:2001.04343, 2020.
  • [43] Filipp Valovich and Francesco Alda. Computational differential privacy from lattice-based cryptography. In International Conference on Number-Theoretic Methods in Cryptology, pages 121–141, 2017.
  • [44] Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher A. Choquette-Choo, Peter Kairouz, H. Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang. Federated learning of gboard language models with differential privacy. In ACL: Industry Track, pages 629–639, 2023. doi:10.18653/V1/2023.ACL-INDUSTRY.60.
  • [45] Mingyuan Zhou. Nonparametric bayesian negative binomial factor analysis. Bayesian Analysis, 13, April 2016. doi:10.1214/17-BA1070.

Appendix A Appendix

A.1 Deferred Proof of Lemma 12

Before we prove Lemma 12, it will be convenient to state the following simple lemma.

Lemma 31.

Let f(x) be log convex on the interval [a,b]. Then for any x[a,b] and Δ+ such that x+Δ[a,b]: f(0)f(Δ)f(x)f(x+Δ).

Proof.

From log-convexity, we have f(0)Δx+Δf(x+Δ)xx+Δf(x), and f(0)xx+Δf(x+Δ)Δx+Δf(Δ). Multiplying the two yields the claimed inequality.

We are now ready to prove Lemma 12.

Proof of Lemma 12.

Assume w.l.o.g. that xx. There are three cases to consider.

  1. 1.

    x<0 and x<0. We have f(x)f(x)f(x)f(x).

  2. 2.

    x<0 and x0. We have f(x)f(x)f(0)f(x)

  3. 3.

    x0 and x0. By f decreasing on [0,), we have f(x)f(x)f(x)f(x+Δ). Applying Lemma 31 concludes the proof.

A.2 Alternative simplified GDL privacy bound

In this section we will outline a tighter version of Corollary 14, which only well-approximates the privacy loss in the small β regime. This bound well-approximates the privacy loss in all β regimes, at the cost of some added complexity in the expression.

Corollary 32 (to Theorem 13).

For any ΔN,a>0,β(0,1), the GDL(β,a)-noise addition mechanism is ε-DP for sensitivity Δ where εaΔ+(1β)log(β+Δ)+log(Γ(β))

Proof.

The proof is identical to Corollary 14, except in the last step where we use the following Wendel’s double inequality [40, eq. 2.6]:

log(Γ(Δ+1)Γ(β)Γ(β+Δ))log(Γ(β)(Δ+β)1β)=(1β)log(β+Δ)+log(Γ(β)).

A.3 Analytical variance of the discrete staircase distribution

In this section we derive the analytical variance of the discrete staircase using the Mathematica software [29], for the purposes of generating Figure 1.

Figure 3: The Discrete Staircase PMF from [18, Figure 5].
Definition 33 ([18]).

The discrete staircase distribution with parameters 1rΔ, ε>0, and Δ is defined as

fDStairr,ε,Δ(i)={a(r)0i<ra(r)eϵri<ΔekϵfDStairr,ε,Δ(ikΔ)kΔi<(k+1)ΔfDStairr,ε,Δ(i)i<0

where a(r)=1b2r+2b(Δr)(1b), b=eϵ, and k.

Lemma 34.

Let z=eε1, then

Var(DStairr,ε,Δ)=x1+x2+x33z2(12r+eε(2r1)+2Δ)

Where

  • x1=2r3z33r2z2(z2Δ)

  • x2=rz(1+e2ε+6Δ(1+Δ)+eε(6Δ(Δ1)2))

  • x3=2eεΔ(1+4Δ2+cosh(ε)+2Δ2cosh(ε)3Δsinh(ε))

Proof.

Let XDStair(r,ε,Δ). We will first compute the variance from just the central “stair”.

C=x=r+1r1x2a(r)=13r(r1)(2r1)a(r)

Finally we compute the variance from the full support of the distribution.

𝔼[X2] =x=x2fDStairr,ε,Δ(x)
=C+2k=1i=1Δ((k1)Δ+r+i1)2fDStairr,ε,Δ(kΔr+i)
=C+2k=1i=1Δ((k1)Δ+r+i1)2a(r)ekϵ

The result follows from symbolic simplification in Mathematica, and replacing eε1 terms with z for ease of presentation.

We also derive the following concrete bound for large ε, which facilitates a comparison with our MSDLap noise.

Observation 35.

For any Δ and εlog(Δ(Δ+1)(2Δ+1)2), the variance of the discrete staircase distribution (for any value of r) is at least 1eε+(2Δ1)Δ(Δ+1)(2Δ+1)3

Proof.

First, notice that, if r1, then fDStairr,ε,Δ(0)13. Since the noise is zero-mean, the variance is thus at least 23 which is at least 1eε+(2Δ1)Δ(Δ+1)(2Δ+1)3 by our condition on ε.

Next, consider the case r=1. In this case, the variance is exactly

i=i2fDStair1,ε,Δ(i)
=2i=1i2fDStair1,ε,Δ(i)
=2j=1Δ=0(j+Δ)2fDStair1,ε,Δ(j+Δ)
=2j=1Δ=0(j+Δ)2a(1)eε(+1)
2j=1Δ=0j2a(1)eε(+1)
=2a(1)(j=1Δj2)(eε(+1))
=2(1eε1+(2Δ1)eε)(Δ(Δ+1)(2Δ+1)6)(eε1eε)
=1eε+(2Δ1)Δ(Δ+1)(2Δ+1)3

where a(1) is as defined in Definition 33.

A.4 Arete convergence

In this section we show a link between the GDL distribution and the Arete distribution from [39].

Definition 36.

For k,θ,λ>0, let Arete(k,θ,λ) denote the distribution of Z1Z2+Z3 where Z1,Z2i.i.d.Γ(k,θ) and Z3Lap(λ).

The main technical lemma linking the GDL and Arete follows from showing how the negative binomial distribution converges to the gamma distribution.

Lemma 37.

Let XNB(k,1e1θΔd) and ZΔdX/Δd. Then ZΔddistΓ(k,θ) as Δd.

Proof.

By Lévy’s continuity theorem, convergence in distribution follows from pointwise convergence of the characteristic function. Denote φ(𝒟) the characteristic function of the distribution 𝒟. We first note the following facts:

  • φ(NB(r,p))=(p1eit(1p))r

  • φ(NB(r,p)/Δd)=(p1eit/Δd(1p))r

  • φ(Γ(k,θ))=(1θt)k

Finally,

limΔdφ(Z) =limΔd(1e1θΔd1e(it1/θ)/Δd)k
L’Hôpital =(limΔdddΔd1e1θΔdddΔd1e(it1/θ)/Δd)k
=(limΔde1θΔdθΔd2Δd2eit1/θΔd(it1/θ))k
=(limΔde1θΔditΔd+1θΔdθ(it1/θ)))k
=(limΔdeitΔd1itθ)k
=(limΔdieitθΔdθi+tθ)k
=(ii+tθ)k
=φ(Γ(k,θ)).

Proposition 38.

Let ZΔdLap(λ)+GDL(k,1θΔd)/Δd. Then ZΔddistArete(k,θ,λ) as Δd.

Proof.

This follows from Definition 9, Definition 36, and Lemma 37.

 Remark 39.

The convergence result in Proposition 38 shows that the Arete mechanism is quite similar to the GDL mechanism transformed with real support via the approach in Theorem 21. The primary difference is how large Δd gets. Using Theorem 21, we only set Δd=O(eε/3), allowing the resulting (discrete) distribution to have “holes” in its support. The purpose of the Laplace noise in that case is to smooth out the holes and ensure support on . On the other hand the Arete (via Proposition 38) requires Δd (with no holes in its support), and the purpose of the Laplace noise is to smooth out the resulting singularity at 0 for sufficiently small values of k.131313See [39, Page 3] for further discussion on this point. As such, the proof technique for Theorem 21 cannot be immediately used to help prove (or disprove) Conjecture 30.

A.5 Parameterized Difference Set and The multi-scale discrete Laplace Mechanism

Recall that, for privacy analysis, we usually only consider the sensitivity of the function q, which is defined as Δ(q)=maxx,x|q(x)q(x)| where the maximum is over all pairs x and x differing on one entry. In this section, we show that, if we parameterized the potential different q(x)q(x) values in a more fine-grained manner, we can achieve an improved error in certain cases.

For a given query function q:Xd, we define the difference set of q, denoted by Sdiff(q), as the set of all possible values of |q(x)q(x)| for all pairs x and x differing on one entry. If Sdiff(q) is finite, we let the Sdiff(q)-multi-scale discrete Laplace Mechanism to be the mechanism that outputs q(x)+iSdiff(q)iXi where Xii.i.d.DLap(ε) for all iSdiff(q).

Theorem 40.

For any query function q:Xd such that Sdiff(q) is finite, the Sdiff(q)-multi-scale discrete Laplace Mechanism is ε-DP. Furthermore, for ε1, its MSE is O(eεiSdiff(q)i2).

Proof.

(Privacy) To show that this mechanism is ε-DP, it suffices to show that
D(q(x)+iSdiff(q)iXiq(x)+iSdiff(q)iXi)ε for any pair x,xXd that differs on one entry. Consider any such fixed pair of x,x. Let ξ=q(x)q(x); due to symmetry of the noise around zero, we may assume that ξ0. If ξ=0, the statement is clearly true. Otherwise, from definition of Sdiff, we have ξSdiff.

From Lemma 6, we have

D(q(x)+iSdiff(q)iXiq(x)+iSdiff(q)iXi)
=D(ξ+iSdiff(q)iXiiSdiff(q)iXi)
D(ξ+ξXξξXξ)
=D(1+XξXξ)ε,

where the last inequality follows from XξDLap(ε). Thus, the mechanism is ε-DP.

(Accuracy) The MSE of the mechanism is

Var(iSdiff(q)iXi)=iSdiff(q)i2Var(Xi)=O(eεiSdiff(q)i2).

To see the advantage of the above mechanism, we note a few scenarios where this mechanism has MSE O(Δ2eε), but where approaches which consider the sensitivity alone must have MSE at least Ω(min{Δ3eε,Δ2e2ε/3}) [18, 17, 19], which is asymptotically larger. In each example we consider a query q with sensitivity Δ.

  • Sdiff(q)[Δ2/3]{Δ}, i.e. there is one large possible difference, but possibly many small ones far from Δ.

  • Sdiff(q)={nm:mlogn(Δ)0} for fixed n,m>0 i.e. differences are structured to form exponential buckets.

Finally, we note that the setting where |Sdiff(q)| is small (even when Δ(q) is large) can occur in practice. As an example, imagine a simple merchant that sells items whose prices are all in the set S={5,10,30,100} and they want to privately sum a database of sales where each row is the sale price of a single item. If a neighboring dataset adds or removes a row, it is clear the sensitivity of this query is Δ=100. For ϵ=10, the continuous staircase will have MSE141414MSE results rounded to two significant figures. 8.5, but the MSDLap mechanism with Sdiff(q)=S will have MSE 1.0, resulting in nearly an order of magnitude improvement.