Abstract 1 Introduction 2 Preliminaries 3 Public-Coin via Domain Compression 4 Low-Bandwidth Private-Coin via Hadamard Matrices 5 Symmetric Locally Private Testing 6 Conclusion References

Uniformity Testing Under User-Level Local Privacy

Clément L. Canonne ORCID The University of Sydney, Australia Abigail Gentle ORCID The University of Sydney, Australia Vikrant Singhal ORCID Harvard University, Cambridge, MA, USA
Abstract

We initiate the study of distribution testing under user-level local differential privacy, where each of n users contributes m samples from the unknown underlying distribution. This setting, albeit very natural, is significantly more challenging than the usual locally private setting, as for the same parameter ε the privacy guarantee must now apply to a full batch of m data points. While some recent work considers distribution learning in this user-level setting, nothing was known for even the most fundamental testing task, uniformity testing (and its generalization, identity testing).

We address this gap, by providing (nearly) sample-optimal user-level LDP algorithms for uniformity and identity testing. Motivated by practical considerations, our main focus is on the private-coin, symmetric setting, which does not require users to share a common random seed nor to have been assigned a globally unique identifier.

Keywords and phrases:
Differential Privacy, Local Differential Privacy, Uniformity Testing, Identity Testing, Hypothesis Testing, User-Level Differential Privacy, Person-Level Differential Privacy
Funding:
Clément L. Canonne: Supported by an ARC DECRA (DE230101329).
Abigail Gentle: Supported by an Australian Government Research Training Program (RTP) Scholarship.
Vikrant Singhal: Supported in part by NSF grant BCS-2218803, the D3 Trustworthy AI Lab, and a grant from the Sloan foundation.
Copyright and License:
[Uncaptioned image] © Clément L. Canonne, Abigail Gentle, and Vikrant Singhal; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy
; Security and privacy Usability in security and privacy ; Security and privacy Privacy protections ; Theory of computation Theory of database privacy and security ; Mathematics of computing Hypothesis testing and confidence interval computation
Related Version:
Full Version: https://arxiv.org/abs/2510.18379
Editor:
Shubhangi Saraf

1 Introduction

We consider the problem of uniformity testing (equivalently, identity testing [29, 24]) of distributions in the setting where each of n distributed users hold m independent and identically distributed observations from some unknown common distribution. This naturally captures many real-world statistical scenarios, such as when data is distributed among many users’ personal devices.

Consider the testing equivalent of the problem specified in [16], where the goal is to learn users’ most-used emoji. In this practical deployment, users are queried once per day with a prespecified privacy budget. Of course, people typically use a lot more than a single emoji during that time period, and so one would hope to obtain information about many emojis at once, from each user. Yet, despite users “sampling” from the “distribution” of emojis multiple times per day, the m=1 setting explored in earlier literature can only make use of one sample.

Can we leverage the fact that each user holds many samples to test the underlying distribution more efficiently, while still preserving each individual’s privacy as a whole?

To formalize this question, we work in the framework of Differential Privacy (DP) [26], specifically Local Differential Privacy (LDP) [33, 38], where data is made private before it leaves the device. This setting is of practical interest as data collection has grown massively, and many users are hesitant to trust a central curator with collecting and storing their data non-privately. Furthermore it has received theoretical attention as a well-parameterized model of learning under constrained information per-sample [2]. However, the usual setting of “item-level” (i.e., single-sample) LDP is ill-suited to our goal, which is to capture the fact that each user can contribute many samples: naïvely, this would correspond to viewing the data of each user as an m-tuple, blowing up the domain size from k to km and leading to severely suboptimal algorithms. Instead, we will work in the more stringent setting of user-level LDP, whereby the privacy guarantee applies to the whole data held by any single user (see Figure 1 and Section 2 for an illustration and definition).

Figure 1: Graphical representation of user-level local differential privacy. Each user holds m samples of some unknown distribution 𝐩, and must be guaranteed privacy of all m samples at once. As pictured, each user sends a single private message to the server, which must somehow carry information about all of their samples.

As we elaborate in Section 1.1, user-level LDP is a much more stringent and challenging setting, as (roughly speaking) the algorithm’s output must be remain similar even when the data of an entire user – a full batch of m samples! – is changed arbitrarily.

As our focus is on capturing practically motivated settings, we also pay special attention to the type of algorithm we design, and how realistic their deployment would be. For instance, adaptive protocols, in which users interact sequentially and adapt their output to the previous message observed, are typically undesirable, as they introduce latency and a host of technical implementation challenges.

For this reason, we consider non-adaptive protocols for this problem; which themselves come in several varieties. In particular a protocol can be public-coin or private-coin, and symmetric or asymmetric. A public-coin protocol assumes the existence of a shared random seed between all participants, while a private-coin protocol does not. Symmetric protocols are such that each user runs the same algorithm locally with the same parameters, while an asymmetric protocol allows some variation between users, as decided by the central data collector or curator. These models capture the level of coordination between the curator and the distributed users such that a public-coin asymmetric protocol is the most coordinated, and a private-coin symmetric protocol is the least. For this reason we will progress through the models from most-coordination to least, as the latter model is more practical and thus preferable. Of course, if the m=1 case is any indication, this practicality may come at a cost, in terms of per-user communication [9] required or overall sample complexity [2].

Can we obtain symmetric, private-coin testing algorithms for user-level LDP uniformity testing? What is the cost of achieving these two desirable features, in terms of sample and communication complexity?

To ground these distinctions and motivate the question further, one can consider the following tasks that are easy in one model, but hard in its complement. A public-coin protocol may assume that all users can apply the same randomly sampled hash function to their data, while a private-coin protocol requires that the hash is either determined in advance, or sampled independently and sent by each user to the server. An asymmetric protocol may easily partition the users into groups of equal size, while to achieve the same effect a symmetric protocol must have users randomly partition themselves and send which group they landed in. This runs into the canonical coupon collector problem, where some groups will be under-sampled, requiring a more thorough analysis, or more samples.

1.1 Overview of our Results and Techniques

In what follows, we focus on uniformity testing, that is, the task of testing whether the unknown distribution 𝐩 is uniform on the domain, or at distance at least α (in total variation distance) from uniform. As discussed in Section 1.2, by a now-standard reduction this directly implies the analogous statements for identity testing, where the reference distribution is a fixed, known reference 𝐪 instead of the uniform distribution 𝐮k.

Our setting (formally described in Section 2) involves n users, each holding m1 i.i.d. samples from an unknown distribution 𝐩 over a known domain of size k2. They engage in a distributed protocol, either public- or private-coin, to enable a central server to perform uniformity testing on this underlying distribution in a locally private manner, with privacy parameter ε>0. Our specific focus will be on the regime 1mk/α2, arguably the most natural and relevant, where each user has a possibly large number of observations, but not enough that they could run a uniformity testing algorithm by themselves:111Indeed, if mk/α2, then each user has enough samples to perform uniformity testing by themselves, and so the question boils down to communicating the answer to the server in a locally private manner, for which n=O(1/ε2) users is sufficient. When m is even larger (specifically, at least the sample complexity of uniformity testing in the central DP model, then n=1 is enough: a single user can run the DP algorithm and send the outcome. we implicitly place ourselves in this regime throughout.

Our first “warm-up” result is a public-coin algorithm222Throughout, and in line with the local privacy literature, we will use “algorithm” and “protocol” interchangeably. which shows that, for uniformity testing, user-level LDP with m samples for each of n user behaves similarly to LDP with one sample for each of mn users:

Theorem 1 (See Theorem 8).

There exists an asymmetric, public-coin, user-level locally differentially private algorithm for uniformity testing (over domain of size k) which on privacy and distance parameters ε>0 and α(0,1] takes

n=O(kmα2ε2)

users, each holding m i.i.d. samples from the unknown distribution and sending one bit of communication.

Whether this result seems unexpected or not to the reader, one interesting aspect is that this result essentially follows from combining, in a very simple way, two known algorithmic building blocks: the first is domain compression [6, 7], a type of hashing which, using shared randomness, allows us to to reduce the domain size without increasing the total variation distance “too much”. The second is the user-level LDP coin learning (asymmetric) protocol of [8], which lets us learn the bias of a single Bernoulli (and so, a fortiori, test it). Combining the two is then rather straightforward, modulo some bookkeeping details: (1) use domain compression to reduce the uniformity testing instance from (k,α) to (2,α/k); then (2) (privately, with m samples) learn the bias of the resulting “coin” to accuracy α/(2k).

In view of the simplicity of the first algorithm, it is natural to expect the private-coin algorithm to be similarly straightforward. As we discuss shortly, this expectation turns out to be significantly optimistic: nonetheless, we are able to establish the following, providing an analogous result in the private-coin setting. (To interpret it, it is useful to remember that, for m=1, the sample complexity of private-coin locally private uniformity testing is known to be Θ(k3/2/(α2ε2)).)

Theorem 2 (1-bit user-level LDP uniformity testing).

There exists a private-coin, user-level locally differentially private algorithm for uniformity testing (over domain size k) which for small privacy parameter ε>0, and distance parameter α(0,1] takes

n=O(k3/2log(k/r)mα2ε2)

users, each holding m i.i.d. samples from the unknown distribution. If the protocol is asymmetric, r=ε and each user sends 1 bit of communication. If the protocol is symmetric, then r=εαm and each user sends O(logk) bits of communication.

One may wonder whether this trade-off between symmetry and communication complexity is inherent: by a simple modification of an argument of Acharya and Sun [9] (originally in the context of (item-level) locally private distribution learning), we provide strong evidence that this trade-off is necessary for at least some parameter regime,333We cannot hope to show this trade-off for all values of m, since for mk/α2, as discussed before, there is a trivial, private-coin, symmetric protocol with a single bit per user. showing that this is the case for m=1:

Proposition 3.

Any symmetric, private-coin algorithm for uniformity testing (over domain of size k) with distance parameter α1/k and m=1 sample per user requires at least log2k bits of communication per user. (This holds regardless of whether the algorithm is locally private or not.)

For completeness, we give a proof of this result in the full version. While only stated for very small α, this result shows that one cannot achieve constant communication complexity per user with symmetric private-coin protocols.

Underlying our results is a technical lemma, likely of independent interest, which gives a way to compress many samples into one bit. This compression, which is likely to also find applications in the (non-private) bandwidth-constrained model, enables us to reduce the task to the much simpler task of privatizing a single bit, rather than a higher-dimensional message. Phrased in terms of differential privacy, this considerably reduces the sensitivity of our randomizers.

Lemma 4 (Informal statement of Lemma 12).

Let XBin(m,1/2+α) with α[1/2,1/2]. Then, the indicator variable

Y𝟙{Xm/2}

is distributed as YBern(1/2+β) where

|β|=Ω(min(mα,1)).

We briefly describe how the above lemma may lead to Theorem 2. The first (by now somewhat standard) idea is to use a good error-correcting code such as the Hadamard code to define a family of k sets444For simplicity of presentation, we implicitly assume in this overview that k is a power of 2, that m is odd, and that everything is a multiple of what it needs to be. χ1,,χk[k], each of size k/2, such that

j=1k(𝐩(χj)12)2dTV(𝐩,𝐮k)2 (1)

(note that since each set has size k/2, if 𝐩=𝐮k then the sum is 0). Then we can partition the users into k groups, where the users of group j “monitor” χj: by computing how many of their m samples fall into their assigned set χj. That is, each user of a given group j now observes a random variable distributed as

Bin(m,𝐩(χj)),

where we can rewrite 𝐩(χj)=1/2+αj. By (1), we then have that j=1kαj2 is either 0 (if 𝐩 is uniform) or at least α2 (𝐩 is far from it). Now, our user has a Bin(m,𝐩(χj)) in hand, and we would like them to send a single bit: this is where Theorem 2 comes in, letting the user send the bit obtained by thresholding their observation at m/2. This yields a bit which has either bias 0 (if 𝐩 is uniform, since m is odd) or some bias βj such that βj=Ω(min(mαj,1)).

By piecing together (and centreing) the bits from one user of each of the k groups, we can view it as one k-dimensional random bit vector in {1,1}k, with mean (β1,,βk).

This enables us to reduce our multi-sample-per-user uniformity testing problem to (privately) testing whether a product distribution over {1,+1}k is uniform or has mean vector with norm at least

j=1kβj2j=1kmin(mαj2,1). (2)

The good news is that we now longer have to worry about user-level privacy: each user only needs to make their one output bit ε-LDP, which is easy to achieve via Randomized Response: by standard arguments, this only changes the mean testing problem by replacing the parameter β2:=j=1kβj2 by O(ε2β2). All that remains after that is to use an out-of-the-box (non-private) algorithm for mean testing of Rademacher-product distributions such as the ones in [20, 21], (not quite) establishing Theorem 2.

Are we there yet?

There are still, unfortunately, a few annoying issues with the above outline. The first is the min in (2): we would like to lower bound j=1kβj2 by mj=1kαj2mα2, but this only allows us to get min(mα2,1), which is (much) weaker for large m. Moreover, this is actually unavoidable in our reduction to mean testing of Rademacher-products!

The key insight is that this is only an issue when some of the sets χj have very large bias (high or low probability) under 𝐩, in which case mαj1. We do not have control over this – it can happen! – but this should be an easy case to detect. And indeed, we can run, on a small number of users, an alternative protocol to detect whether there exists a 1jk with αj=|𝐩(χj)1/2|1/m. By standard concentration arguments for maximum of Binomials, losing only a logk factor, we can argue that, with high probability, this will not lead to erroneously rejecting the uniform distribution (𝐩(χj)=1/2 for all j), but will detect if any of these 𝐩(χj) is overly biased. If this test passes, then whether 𝐩 is uniform or not, we know that all αj are small enough, and so j=1kβj2mα2logk.

This leads us to the second annoying issue: namely, that the above outline leads to a good sample complexity, but runs two distinct sub-protocols, one of them partitioning our n users in k distinct groups: this is very much an asymmetric protocol, with users running k+1 distinct randomizers depending on their identity. To handle this, there is an “obvious”, general-purpose solution: let each user pick uniformly at random which of these k+1 randomizers to run, and send both the (private) output and the (non-private) index of the randomizer they picked. By a coupon collector argument, this works with high probability, at the cost of a logarithmic factor in the number of users (and an O(logk) additional bits of communication per user).

As Proposition 3 asserts, the latter is necessary; the former, however, is very much wasteful. To avoid paying this logarithmic factor, we provide a generalization of the result for mean testing for Rademacher-product distributions, which allows for a different (and random) number of observations per coordinate and which we believe is of independent interest (Theorem 16). Its analysis, while intuitively simple, is rather technical and relies on a symmetrization argument by negative association: we provide it in the full version.

Finally, we observe that while our focus is privacy, we obtain novel results for the bandwidth-constrained setting [4] as well, where each user has limited communication budget with which to share their data.

Naïve approaches: “why did you not simply do this?”

When considering this model, the first and most natural approach is to consider the composition theorems of differential privacy. Simple composition of differential states that sending T messages, each with ε-differential privacy, results in a final privacy loss of Tε. We use these theorems by pretending that there are nm total users, each with m=1, and each guaranteed privacy ε=ε(ε,m). Even in the most generous case when our sample complexity is some n1/ε, this results in no apparent gains from the additional samples over simply picking one of the m uniformly at random. In reality the situation is worse, as we frequently have n1/ε2, meaning that sending m messages, each with privacy parameter ε/m will result in an even worse final sample complexity.

There also exists advanced composition [28, 27] that say that the same composition satisfies approximate555We do not further consider approximate-DP in this work, however one can informally think of δ as the “probability of not being ε-DP”. (ε,δ)-differential privacy with final ε=ε2Tlog(1/δ)+Tε(eε1). Applying this once again to some arbitrary problem with n1/ε2, we see that our final sample complexity is again nmm/ε2, yielding no gains.

As mentioned earlier, another simple approach exists when m is large enough for each user to (non-privately) test on their own. In this regime we have each user test locally, reporting a bit that indicates accept or reject. We then simply learn the average bit using ε-LDP, for which n=O(1/ε2) is sufficient. Therefore, for the very specific regime when m>k/α2 and n>1/ε2, we have a final sample complexity nm=O(k/(α2ε2)). The existence of this approach allows us to assume mk/α2 for the remainder of this work.

A slightly less naïve approach (and why it is not enough).

Another approach one could consider is one based on testing via repetition, used in [23, Section 2] for uniformity testing in the streaming setting. Assuming that mk, we can run the following “privatized” version of this protocol. For each user, i, and each domain element j[k], let Ni,j the number of samples of user i that are j. Then each user computes the statistic,

Zi=1kj=1k𝟙{Ni,j=0}

and applies an additive ε-LDP noise mechanism. Then, the final statistic computed by the server would be the average of the private versions of all the Zi’s, i.e., the private version of Z=1ni=1nZi. To make each Zi private, each user would need to add noise to Zi that is calibrated to the sensitivity, Δ, of Zi (i.e., the effect of changing all the m samples of that user on Zi in the worst case – if Zi is the new statistic after changing all of user i’s samples, then the sensitivity is the maximum possible value of |ZiZi|). In this case, the sensitivity is (m1)/k. The private version of Zi is then simply Z~i=Zi+τi, where τi𝖫𝖺𝗉(mε), and correspondingly the private version of Z is Z~=1ni=1nZ~i=Z+τn, where τ=τ1++τn.

Now, by linearity of expectations, 𝔼[Z~]=𝔼[Z]+𝔼[τ/n]=𝔼[Z]. Then for 𝐩 α-far from 𝐮, following the proof of [23, Section 2] we get

𝔼𝐩[Z~]𝔼𝐮[Z~]m2α24ek2T

and, since τ and Z are independent, Var𝐩(Z~),Var𝐮(Z~)2m2nk3+2Δ2nε2. By Chebyshev’s inequality, we then get that a protocol which computes this Z~ at the server and thresholds its value at 𝔼𝐮[Z~]+T/2 gives a valid (private, private-coin) uniformity testing algorithm as long as

nm2kα4+k2ε2α2.

Even though this is an improvement over the previously outlined methods, the sample complexity still falls short of what we are hoping for. One may hope to improve it by using techniques like the sensitivity-reducing mapping from [13, 12] that could potentially reduce the sensitivity Δ from m/k to 1/k. However, even in the best possible case, that would only improve the sample complexity to

nm2kα4andnm4k2ε2α2.

As we see, even this best-case scenario would not improve the dependence on k, and be very costly for small m.

1.2 Prior work

Uniformity testing was first considered in the context of theoretical computer science by [30], and the optimal sample complexity Θ(k/α2) was obtained in [35]. These results have since been generalized to the identity testing problem, where the reference distribution is not uniform, a problem later proven to be formally equivalent to uniformity testing [25, 29, 24]. The task has since also been considered under (differential) privacy, first in the central model of differential privacy [18], where the tight sample complexity was later shown to be Θ(k/α2+k/(αε)+k1/3/(α4/3ε2/3)+1/αε) [10, 14].

Under the more restrictive model of local differential privacy, the question of testing was raised in [36] and has since been wholly resolved. It is known now that the tight sample complexity for non-interactive private-coin protocols is Θ(k3/2/(α2ε2)), while non-interactive public-coin protocols achieve Θ(k/(α2ε2)) [2, 5, 1]. Furthermore, allowing interactivity cannot improve the sample complexity beyond the bound given in the public-coin setting [3, 15, 37].

In the shuffle model of differential privacy, where users’ responses are permuted before being received by the central curator, an upper bound of O((k3/4log(1/δ))/(αε)+k/α2) was shown in [22]. The best known lower bounds were derived from a connection to pan-privacy [17].

The user-level setting is a natural generalization of any distributed statistical problem. In practice, it is restrictive to consider users holding only one sample from the distribution of interest whether we are testing, learning, or performing any other distributed task. For this reason, distribution learning with multiple samples was studied under bandwidth constraints, where an intricate interplay between the number of samples per-user and the number of bits available to communicate the samples was shown [4]. Under user-level differential privacy, these results were extended to show that (among other things) for small ε, one achieves a sample complexity for learning of n=O(k2/(mα2ε2)). Comparing this to the case when m=1 (For example [11]) we can see that the risk decreases linearly with m. We could say that for the task of learning, having m samples per-user is as beneficial as having m times more users to query.

Even though there is no direct reduction between the two settings, and (as inferred from the previous discussion) the situation, as m grows, becomes quite subtle, this gives strong evidence that our results which show a sample complexity improvement by a factor m are tight, at least in the most relevant parameter regime for m.

Organization

The remainder of this paper is organized as follows. Section 2 establishes necessary facts and definitions we use throughout the paper, including the necessary background on differential privacy, distribution testing, and some concentration inequalities important to this work. In Section 3, we present a result for the public-coin setting, combining known theorems from distribution testing and user-level private learning. Section 4 introduces our main technical results in the form of non-private protocols that our private protocols are built upon. In particular, we show new algorithms for various testing problems, as well as several technical lemmata we believe are of independent interest. Due to the length and technical depth of this section, we defer the question of establishing privacy to Section 5, where we show how to adapt the results of the previous section to the user-level local differential privacy setting. Finally, we conclude with some open questions surrounding user-level local differential privacy. Wherever a proof, lemma, or algorithm is deferred from the main body, we include an appropriate reference to the full version.

2 Preliminaries

We use n to refer to the number of users participating in the protocol (or in the dataset) and m to refer to the number of samples (or data points) each user has. Additionally, we use k to denote the domain size, ε to denote the privacy parameter, α to denote the accuracy parameter, and β to denote the probability of failure (in terms of accuracy) of our protocol. We also write [k]:={1,2,,k}.

Next, for a distribution 𝐩, 𝐩m refers to the m-dimensional product distribution with each marginal 𝐩. For two distributions 𝐩 and 𝐪 over the same (countably infinite) domain Ω, we denote by dTV(𝐩,𝐩) the total variation distance between them, defined as

dTV(𝐩,𝐪)=supSΩ(𝐩(S)𝐪(S))=12xΩ|𝐩(x)𝐪(x)|.

In the context of Bernoulli and Binomial distributions with parameter p, we call their deviation from 1/2 (i.e., p1/2) their bias.

Finally, we use the notation , , and to denote the (sometimes slightly more convenient) analogues of the Ω(), O(), and Θ() notation: specifically, for two sequences (an)n, (bn)n indexed by some parameter n, we write anbn if there exists C>0 such that anCbn for every n0, with the inequality reversed for . anbn then denotes that both anbn and anbn hold. Throughout, and denote minimum and maximum: ab=min(a,b), and ab=max(a,b).

2.1 Differential Privacy

We first provide the necessary notions and results from the differential privacy (DP), starting with the definition of local differential privacy (LDP).

Definition 5 (Local Differential Privacy [32, 38]).

For ε>0, a randomized algorithm Q:𝒳𝒴 provides ε-Local Differential Privacy if for all i,j𝒳,

maxy𝒴Pr[Q(i)=y]Pr[Q(j)=y]eε.

While Local Differential Privacy is somewhat involved to define for interactive protocols, where each user can send (in an adaptive manner) several messages, it is simpler in our setting. We consider non-interactive protocols, where each user only sends one message to the server. When each user holds several datapoints (that is, 𝒳=[k]m), the above definition then directly corresponds to the user-level LDP guarantee considered in this paper.

We will rely on the (binary) Randomized Response mechanism, which is an optimal ε-local differential privacy protocol for 1-bit inputs [33]. Formally, let Q be the local randomizer for this protocol where Q(x) is a random variable, and Q(y|x) is the probability of seeing output y on input x.

Q(y|x)={eεeε+1y=x1eε+1yx.

2.2 Distribution and Uniformity Testing

Here, we formally state the problem and setting. We are interested in the problem of uniformity testing with multiple samples per user. Specifically, given a discrete distribution 𝐩 over k elements, which we assume without loss of generality is over the domain [k], each user i=1,,n holds a multi-sample Xi𝐩m which is an m-dimensional vector of i.i.d. samples from 𝐩. We are interested in algorithms that can distinguish between the cases where dTV(𝐩,𝐮)=0 and dTV(𝐩,𝐮)>α via the fewest possible number of users n. As mentioned in the introduction, barring some kind of information constraint per-user, this problem reduces to the well-studied case where m=1 by having each user send all m of their samples and treating each as its own message. This problem becomes non-trivial when we are either bandwidth-constrained or privacy-constrained. In the former, users have up to bits with which to communicate their samples. In the latter, the user is constrained by differential privacy.

In the user-level LDP setting, upon receiving n private responses Y1,,Yn from the users, the server must distinguish between the two cases:

  • dTV(𝐩,𝐮)=0, and

  • dTV(𝐩,𝐮)>α,

with probability at least 2/3, while satisfying ε-user-level LDP. (The threshold 2/3 is somewhat arbitrary, and can be amplified to any 1β by standard arguments at a sample complexity cost of O(log(1/β)).)

2.3 Useful Probability Tools

We first recall Cantelli’s inequality, a one-sided version of Chebyshev’s inequality:

Lemma 6 (Cantelli’s inequality).

Let X be a real-valued random variable with finite variance. Then, for every λ>0,

Pr[X𝔼[X]+λ],Pr[X𝔼[X]λ]Var[X]Var[X]+λ2.

We will also require the following standard tail bound for subgaussian random variables (see e.g., [31]):

Lemma 7.

Let X1,,Xn be (not necessarily independent) σ2-subgaussian random variables with mean zero. Then

𝔼[max1inXi]2σ2logn

and, for every t>0,

{max1inXi2σ2(logn+t)}et.

In particular, this implies that, for every t>0,

{max1in|Xi|2σ2(log(2n)+t)}et.

3 Public-Coin via Domain Compression

In this section, as a warmup, we show how to establish our public-coin result, Theorem 1, leveraging existing algorithm building blocks, domain compression and user-level coin estimation. We restate it below:

Theorem 8 (LDP 1-bit asymmetric public coin uniformity testing).

There exists an asymmetric, public-coin, user-level locally differentially private algorithm for uniformity testing (over domain of size k) which on privacy and distance parameters ε>0 and α(0,1] takes

n=O(kmα2ε2)

users, each holding m i.i.d. samples from the unknown distribution and sending one bit of communication.

Proof (Detailed sketch).

The protocol works as follows: using public randomness, the n users jointly perform domain compression, i.e., a type of hashing of the domain, reducing the domain size from k to 2. By the following lemma, with constant probability, this preserves the distance between distributions up to a k factor:

Lemma 9 (Domain Compression [24]).

There exists absolute constants c1,c2>0 such that the following holds. For any 2k and any 𝐩,𝐪Δk,

PrΠ[dTV(𝐩Π,𝐪Π)c1kdTV(𝐩,𝐪)]c2

where Π=(Π1,,Π) is a uniformly random partition of [k] into subsets, and 𝐩ΠΔ denotes the probability distribution on [] induced by 𝐩 and Π.

That is, if 𝐩=𝐮k, then this (always) results in a distribution uniform over a domain size 2 (i.e., a fair coin), while if 𝐩 was α-far from uniform this results (with probability Ω(1)) in a coin with bias α=Ω(α/k).

All n users now having m samples from the same induced “coin”, all that remains is to learn the bias of this coin to accuracy α/4 in order to distinguish between the two cases. This can be done with the following existing protocol:

Theorem 10 ([4, Theorem 3.2]).

For ε(0,1], there exists a private-coin algorithm for person-level coin bias estimation with

𝔼[(p^p)2]=O(1nmε2)

assuming nClog(m)/ε2, where C>0 is an absolute constant.

Phrased differently, with the above one can privately learn the bias of a coin to an additive α, with arbitrary (constant) probability, in the user-level LDP setting, using n=O(1/(mα2ε2) users.

Putting things together and recalling our setting of α, the above protocol then succeeds with (small) constant probability, as long as

nmkα2ε2

“as claimed.” This is so far a symmetric protocol: but the probability of success, for distributions far from uniform, is quite small, as the domain compression only preserves the distances with small probability. To amplify the probability of success to 2/3 by standard techniques, we repeat the protocol on disjoint batches of users and combine their answers via a majority vote, leading to an asymmetric protocol. Finally, note that the condition on n from Theorem 10 is indeed satisfied, as kmα2ε2log(m)ε2.

4 Low-Bandwidth Private-Coin via Hadamard Matrices

In order to prove Theorem 2 we must first introduce our novel (non-private) algorithm for uniformity testing. Due to the technical depth of this section, and as the analysis follows straightforwardly in the private case, we defer the privacy analysis to Section 5 where we give a sketch of the proof. The complete proof of the private algorithm can be found in the full version, and largely follows the same path as the proof contained in this section, differing only in the application of binary randomized response at key points.

As discussed in Section 1, our high-level approach will proceed as follows: given n users, each holding m samples from some unknown discrete distribution 𝐩 over k=2t elements, we first assign each user i a group Gj, for 2jk (intentionally dropping a group so that we have k1 groups). As a first attempt, consider this assignment to be a deterministic function j(i) that evenly partitions the n users among k groups. Later, to remove this coordination step, which would lead to an asymmetric protocol, we will instead have each users select the index j of their group uniformly at random. We describe our algorithm in stages: first, we give in Section 4.1 an asymmetric algorithm well-suited for “small” values of m, before providing a complementary approach to detect large variations from uniformity (Section 4.2), and explaining how to put them together in Section 4.3.

4.1 An algorithm for small 𝒎

Using the same indexing as for the groups, let χj be the set of indices where a 1 appears in the column j of the Hadamard matrix H{±1}k×k, that is,

χj:={r[k]:Hrj=+1},2jk. (3)

The properties of Hadamard matrices ensure that each of its columns is a column vector of length 2t, where half of the positions are +1, and the other half are 1, and so |χj|=k/2 for all j. Hereafter, we will write 𝐩(χj) to denote the probability that a sample under 𝐩 falls in set χj, i.e., 𝐩(χj)=rχj𝐩r. User i then computes

Xi==1m𝟙{xχj(i)},

i.e., the number (out of their m samples) that lie within χj (the subset of the domain they are “monitoring”). One can observe the following distinction of cases:

  • If 𝐩=𝐮k, then XiBin(m,1/2).

  • If dTV(𝐩,𝐮k)>α, then XiBin(m,1/2+αj), for some αj which will be related to α by Lemma 13.

We call αj the bias observed by group Gj. User i, instead of directly sending Xi, will then compute the one-bit indicator

Yi=𝟙{Xim+12},

and send this to the server. We will show that, given n=O(k3/2mα2) independently drawn samples Y1,,Yn as above, there exists an algorithm that distinguishes between 𝐩=𝐮k and 𝐩 α-far from 𝐮k (with probability at least 2/3).

Theorem 11 (Asymmetric 1-bit multi-sample uniformity tester).

Given n users, each holding m samples of some unknown distribution 𝐩 on k elements. There exists an algorithm (Algorithm 1) that distinguishes between 𝐩=𝐮 and dTV(𝐩,𝐮)>α using

n=O(k3/2mα21),

samples. Moreover, each user only sends one bit.

(Note that this protocol, which follows the above outline, is asymmetric, as users are partitioned in k distinct groups of equal size, and users from different groups process their m samples differently.) This first algorithm is particularly well suited to the “small m regime” where m1/α2, where it achieves the desired number of users.

The rest of this subsection is dedicated to establishing Theorem 11. In view of this, we will need three ingredients: first, demonstrating that Yi has bias mαj (this factor m is where the dependence on m in the final bound will come from); second, showing that a distribution that is α-far from uniform induces bias in each group, such that j=1kαj2α2; and third, a known result on uniformity testing for Rademacher-product distributions. The proof proceeds immediately from these results: Each user sends their single bit Yi with bias mαj. We then take one bit from each group and arrange them into a vector. This vector simulates a sample from the product distribution with mean μ=(𝐩(χ1),,𝐩(χk)). Applying Lemma 13 we get that the expected 22 norm of this vector is at least mα2. Plugging in an 22-norm product tester as a black box, we conclude our proof.

As stated above, we begin by capturing the behaviour of Yi as a function of each αj. Recall that Yi is an indicator variable that declares whether a binomial Xi exceeded the mean it “should” have in the uniform case: in this sense, going from Xi to Yi converts a “many-bit” sample to a “single-bit” one. We show that for XBin(m,1/2±α) with small bias α, this indicator behaves as a Bernoulli with mean mα, and so this conversion from many bits to one still preserves both bias α and a dependence on m. For ease of exposition, we defer its proof to the end of the section.

Algorithm 1 Asymmetric Hadamard Protocol for Uniformity Testing.
Lemma 12 (From Many Bits, One).

Let m=211 be an odd integer, and α[1/2,1/2]. Define

Y:=𝟙{X}

where XBin(m,1/2+α). Then YBern(1/2+β), where

|β|=Ω(min(m|α|,1)).

(Moreover, if α=0, then β=0.)

While the bias αj observed by each group is clearly distribution-dependent, we need to relate them explicitly to the distance parameter α. To do so, we use the known fact that multiplication by a Hadamard matrix is 2-norm preserving.

Lemma 13 (Hadamard transform is norm-preserving).

As in the process described above, let 𝐩 be a distribution with dTV(𝐩,𝐮)α. Let H be the 2t×2t Hadamard matrix with ±1 entries. For each column j{2,3,,2t} (excluding the all-ones column), define χj:={i[k]:Hij=+1}, and let αj:=𝐩(χj)12 be the bias of column j under distribution 𝐩. Then

j=1kαj2α2.

(This follows, for instance, from [2, Lemma 3], combined with Cauchy–Schwarz to relate total variation and 2 distances.) For completeness, we provide a self-contained proof in the full version.

Finally, the third (and last) ingredient missing is an algorithm to test, given i.i.d. observations from a product distribution on k bits, whether the mean vector is zero (uniform distribution) or has large norm:

Lemma 14 (Uniformity testing on product distributions).

Fix any d2. There exists an algorithm that, given a parameter γ(0,d] and n i.i.d. samples from a product distribution 𝐩 on {1,1}d with μ:=𝔼X𝐩[X][1,1]d, has the following guarantees.

  • If μ2γ2, the algorithm returns accept w.p. 23;

  • If μ2γ, the algorithm returns reject w.p. 23;

as long as nCdγ2, for some absolute constant C>0.

For a proof of this in the case γ(0,1], see, e.g., [20, Section 2.1] or [21, Lemma 4.2], which establish this along the way, while focusing on testing in total variation distance, or [19, Theorem 4.1], which provides slightly stronger guarantees. We note that while stated only for γ(0,1], the proofs above actually implicitly show the result for the whole range of γ. For completeness, we provide a self-contained proof (for the whole range of γ) in the full version.

With these three building blocks in hand, we are ready to analyze Algorithm 1:

Proof of Theorem 11.

Recall that user i is deterministically assigned group index j(i){2,,k} such that each group is of equal size.666The first column (and row) of the Hadamard matrix are all-ones, and can be ignored, or otherwise simulated if necessary, as any user’s behaviour in this group would be to deterministically send a 1. Where i is clear from context or not relevant we suppress this notation, letting jj(i). We have each user send their Yi per Algorithm 1 and focus on the server’s view. Clearly Yi is distributed as some (yet unknown) Bernoulli YiBern(12+βj), where βj depends on 𝐩. Centering each of these, we get the Rademacher random variables

Zi=2(Yi1/2){1,1}

each with mean 𝔼[Zi]=βj(i). Now, we wish to apply Lemma 14 which takes as input samples from some product distribution. To facilitate this we construct our own vector samples by concatenating one Zi from each group. Each group Gj contains N=n/(k1) users. So, as described in Algorithm 1, we create the vectors {Z(1),,Z(N)} so that Zj() holds the bit sent by the ’th user of group j.

Each Z() therefore has mean vector μ=(β1,,βk). Applying Lemma 12, we get that, for all 1jk, (1) if 𝐩=𝐮k, then βj=0; and (2) otherwise, we have |βj|=Ω(min(mαj,1)). Combining this with Lemma 13, this vector μ satisfies:

  • if 𝐩=𝐮, then μ=𝟎k;

  • if dTV(𝐩,𝐮)>α, then

    μ22=j=1kβj2j=1k(mαj21) (4)

and the RHS is at least mα21. This is exactly the setting we need to invoke Lemma 14: setting γ=mα1, and N=n/(k1), this yields a final sample complexity of n=O(k3/2mα21).

Before proceeding to the next component of our algorithm, it remains to establish Lemma 12.

Proof of Lemma 12.

We start with the “Moreover” statement: if α=0, then XBin(m,1/2). By symmetry, mXBin(m,1/2), and since m=21,

Pr[X]=Pr[mX]=Pr[Xm]=Pr[X1]

from which Pr[X]=1/2.

Assume without loss of generality that α0 (as otherwise we can consider mX instead). To establish the first part of the statement, we will distinguish between three cases, depending on how large α.

  • First case: α2/m. By Cantelli’s inequality (Lemma 6), since 𝔼[X]=1/2+mα,

    Pr[X<]=Pr[X<𝔼[X](mα12)]Var[X]Var[X]+(mα12)211+mα215

    using mα>1 and Var[X]m/4. This shows that Pr[X]4/5, i.e., β3/10.

  • Second case: α<3/m.777The choice of the constant 3 in 3/m (instead of the more natural 1/m) may appear somewhat arbitrary: this specific value for the cut-off will be useful, for technical reasons, in the third and last case. In this case, the mean of X only differs by αm<3=O(1) from m/2, the mean of a standard Binomial X~Bin(m,1/2) (and the modes of the two distributions are either the same or very close integers), so the change in probability mass between the two is quite subtle. We can provide a coupling between X and X~ as follows:

    X=X~+Z~

    where the distribution of Z~, conditioned on X~, is Z~Bin(X~,2α). One can check that this satisfies XBin(m,1/2+α) (i.e., this is a valid coupling) and directly implies that XX~ a.s. Now, we have

    Pr[X] Pr[X~]+Pr[X~=1,Z~1]
    =12+Pr[X~=1]Pr[Bin(1,2α)1]
    =12+(211)2m(1(12α)1)
    =12+Θ(1α),

    using in the last step that α=O(1/). This shows that in this case β=Θ(α)=Θ(mα).

  • Third case: 3/mα<2/m. In this regime, we can rely on the Gaussian approximation, as quantified by the Berry–Esseen theorem (see, e.g., [34, Section 11.5], which guarantees that the CDF F of the normalized version of X,

    X:=X𝔼[X]Var[X]=2Xm(1+2α)m(14α2)

    is pointwise close to the CDF Φ of a standard Gaussian Z𝒩(0,1):

    supx|F(x)Φ(x)|Cm,

    for some absolute constant C>0 (one can take C=0.56). In particular, this implies, in our case, that

    Pr[X<] =Pr[X<2mα1m(14α2)]
    Pr[X<mα14α2] (as 2mα1mα)
    Pr[X<mα]
    Pr[Z<mα]+Cm (Berry–Esseen)
    12mα5+Cm (Studying Φ for mα[0,2])
    121100mα

    the last step using that mα3 and C0.56. This shows that, in this regime as well, β=Ω(mα).

This concludes the distinction of cases, and the proof.

4.2 An algorithm for large 𝒎

The above approach gives the “right” sample complexity under the restriction that

mαj21,j[k],

where αj=𝐩(χj)12. We here provide a different protocol, which works well when at least one of the |αj|’s is large. The main idea is to have each user just check is any of the k sets χj receives a lot more (or less) than half of their m samples, namely, m2±Ω(T) for some suitable threshold T=Θ(mlog(nk)). If 𝐩 is uniform, then this is highly unlikely, as by Lemma 7 the maximum deviation of k subgaussian random variables (here, Binomials) from their mean is less than T with overwhelming probability. But if 𝐩 is not uniform and one of the |αj|’s is large, then the number of samples falling in the corresponding set χj is a very biased Binomial, and for every user this set will receive m2±Ω(T) samples with high constant probability. We formalize this idea in Algorithm 2, starting with the non-private version; and prove the following theorem:

Algorithm 2 Symmetric protocol for large m.
Theorem 15.

There exists an algorithm (Algorithm 2) with the following guarantees:

  • If 𝐩=𝐮k, then the center outputs accept with probability at least 9/10;

  • If there exists 1jk such that αj=Ω(log(nk)/m), then the center outputs reject with probability at least 9/10.

Moreover, each person sends only one bit.

Proof.

In what follows, we set R:=2Tm=ln(20nk).

  • If 𝐩 is uniform, then 𝐩(χj)=1/2 for all j. By standard tail bounds for subgaussian random variables, this means that, for every 1in

    Pr[max1jk|Vj(i)m2|>T]110n

    given our setting of T (specifically, we apply Lemma 7 with t=ln(10n) to the k mean-zero random variables (Vj(i)m2)j, which are centered Binomials, and thus m4-subgaussian). By a union bound, we get Pr[i,vi=1]1/10, and the server outputs accept with probability at least 9/10.

  • Turning to the other case, assume there exists j[k] such that |αj|>Rm. Then, for every person i, |𝔼[Vj(i)]m2|>Rm, and

    max1jk|Vj(i)m2||Vj(i)m2|>T,

    where the last inequality holds with probability at least 2/3 by Chebyshev (as Rln(20k) is large enough, for large enough k). This implies that an expected 910n of the vi’s will be equal to 1: more precisely, i=1nviBin(n,τ) with τ9/10. The probability that such a Binomial is less that n/2 is at most 1/10 (and actually eΩ(n)), and so the center will reject with probability at least 9/10 (and actually 1eΩ(n)).

4.3 Combined Algorithm for all values of 𝒎

There exists a critical regime of parameters that Algorithm 1 struggles with, leading to suboptimal sample complexity; and these are handled exactly by Algorithm 2. We can handle both regimes as follows: have the server run both protocols, the former with n1=n1 and the latter with n2=1 users, and return accept if, and only if, both of them return accept. Assume

nk3/2logkmα2k.

Then, we have the following distinction of cases:

  • If 𝐩=𝐮k, then both protocols accept with probability at least 9/10 each, so overall the centre returns accept with probability at least 8/10;

  • If dTV(𝐩,𝐮k)>α, then

    • If there exists j[k] such that

      αj=Ω(log(k)/m)

      then, by Theorem 15, the second protocol (with one user) outputs reject with probability at least 9/10, in which case the center outputs reject.

    • Otherwise, then by (4) the mean of the product distribution in the first protocol is at least

      μ22j=1k(mαj21)j=1k(mαj2logk1)mα2logk (5)

      since mαj2logk for all j, and j=1kαj2α2. Then, concluding as in Section 4 by invoking the uniformity testing algorithm for product distributions of Lemma 14 (which also handles the full range of distance parameter γ2:=mα2logk(0,k]), the server rejects with probability at least 9/10, as

      nkk1/2γ21.

    Either way, at least one of the two tests outputs reject with probability 9/10, and so does the center.

4.4 Symmetric Protocols via Generalized Product Testing

The above protocol is asymmetric in that, as stated, it needs users to be divided into k groups. Of these, k1 groups will run Algorithm 1, each with a different column. The last group (of only 1 user) runs Algorithm 2.

To resolve the asymmetry in Algorithm 1 we prove the following statement, which covers the setting we require to make our protocol symmetric: each of n:=nd users independently selects a random coordinate and reports a sample from that coordinate. Formally, observations (X1,j1),(Xn,jn) are obtained by choosing, independently for each i[n], ji[d] uniformly at random, and Xi𝐩ji. In this case, the numbers of times n1,,nd each coordinate j[d] is sampled are (correlated) Bin(nd,1/d) random variables.

Theorem 16.

There exists an algorithm (available in the full version) which, given parameters γ(0,d], n1, and sample access to distributions 𝐩1,,𝐩d on {1,1}, chooses n1,,nd at random from a multinomial distribution with parameters n:=nd, d, and (1/d,,1/d); and then is given nj i.i.d. samples from each 𝐩j (where the samples are independent from the choice of ni’s). Then, letting μ:=𝔼X𝐩1𝐩d[X][1,1]d, it has the following guarantees.

  • If μ2γ2, the algorithm returns accept w.p. 23;

  • If μ2γ, the algorithm returns reject w.p. 23;

as long as nCdγ21 for some absolute constant C>0.

We defer the proof to the full version.

As Theorem 16 takes exactly the same parameters, and has exactly the same guarantees as Lemma 14, we do not restate the proof of Theorem 11. We need only note that having each user randomly sample their own index j and send it to the center is indeed distributed as described above. This of course increases communication to O(logk) bits.

This alone does not make the entire protocol symmetric. As we said, we still have one user assigned to running Algorithm 2. However, this is easily addressed. At the cost of one extra bit of communication per user, we can simply have all users run this second test, and then let the center select arbitrarily one of the n outcomes to use. This finally gives a (non-private) algorithm, and can be summarized as follows:

Theorem 17 (Symmetric private-coin uniformity testing).

There exists a symmetric, private-coin (non-private) algorithm for uniformity testing (over domain size k) which on distance parameter α(0,1] takes

n=O(k3/2logkmα2k)

users, each holding m i.i.d. samples from the unknown distribution, and sending O(logk) bits of communication.

Note that the first term dominates when mO(k/α2), which is the regime of interest (as otherwise a single user has enough samples to run a uniformity testing algorithm by themselves).

Of course, this result may still be somewhat underwhelming, as (albeit communication-efficient) the algorithm does not provide any privacy guarantee. In the next section, we will see how it can be easily adapted to yield our main result, Theorem 2.

5 Symmetric Locally Private Testing

We here sketch the analysis of the private analogues to the algorithms defined in Section 4, thus completing the proof of Theorem 2. We focus on making each of our two algorithms private before showing how they can be combined. In each case we will apply binary randomized response [33] to the bit returned by each user. As discussed before, the details of the analysis are provided in the full version of the paper.

Combining the two private algorithms gives us an asymmetric and symmetric protocol, each with sample complexity comparable to the m=1 case with m times as many users, up to logarithmic factors.

Private Algorithm 1 for small 𝒎.

We first introduce the private version of Algorithm 1, which handles the case when m is small. As described above, we make this algorithm private by an application of binary randomized response to the bit returned by each user. Consider the proof of Theorem 17, which itself follows the proof of Theorem 11. By applying randomized response to the bit sent by each user, one can quickly derive that each vector coordinate Zj() is now distributed as a Rademacher with mean |βj|=Ω(2eε1eε+1(mαj1)). Thereafter, applying the same steps as in the proof of Theorem 11 yields an algorithm with sample complexity

n=O(k3/2ε2(mα21)).

Noting that as ε grows, this rapidly converges to the non-private result.

Private Algorithm 2 for large 𝒎.

Recall that the second stage of our algorithm, defined in Section 4.2 handles the case when one of the subsets defined by the Hadamard matrix is overrepresented. Non-privately we only require that a single user perform this test for all of the subsets and report their response. Under local differential privacy this would not have any high-probability guarantee. Instead we use a known (and easily derived) bound for learning coins under binary randomized response. Specifically, that one can learn a Bernoulli through randomized response up to additive error α with success probability 2/3 using n=O(1/(α2ε2)) samples. This yields the following lemma:

Lemma 18 (Private version of Theorem 15).

Algorithm 2, when each user applies binary randomized response to their output, has the following guarantees:

  • If 𝐩=𝐮k, then the server outputs accept with probability at least 2eε+13(eε+1)

  • If there exists 1jk such that αj=Ω(log(nk)/m), then the server outputs reject with probability at least 2eε+13(eε+1)

Analysis of the Combined Algorithm.

As in the non-private case, we have to consider how to combine these two algorithms. First, we consider the “easy” asymmetric case; allocating n1 users to run the private analogue to Algorithm 1, we then only need n2=O(1/ε2) users to run the second protocol. As the number of users required by the second protocol is clearly dominated by the first, we retain much the same sample complexity up to a logarithmic factor. We defer the proof to the full version, however it suffices to say that we derive as a final sample complexity for the asymmetric algorithm

nk3/2mα2(ε21)log(kε1).

To make the protocol symmetric we can follow the same procedure described in Section 4.4 and have all users run the second protocol. This incurs two costs, (1) we must divide the privacy budget between these two protocols, and (2) we lose a logarithmic n factor in the final sample complexity.

Taking the path of least resistance, we assume that each user runs the private version of Algorithm 1 with privacy parameter ε1=ε/2, and likewise runs Algorithm 2 with ε2=ε/2. As such, we have n2=n1=n and so gain an n2 term in the log of Lemma 18. Setting n2 to the sample complexity derived for the private analogue of Algorithm 2, one can see that we require

nk3/2mα2(ε21)log(kmα(ε1)).

users. Combining both bounds completes the proof of Theorem 2.

6 Conclusion

User-level locally private distribution testing is still far from being understood. We observe many phase transitions as ε and m vary. Consider the algorithm for testing in the central model of differential privacy discussed in Section 1.2, when m exceeds the stated sample complexity we see that the required number of “users” n goes to 1, and only 1 bit of communication is needed.

How exactly do these algorithms behave as a sliding scale between the local and central models of differential privacy? Characterizing the behaviour in each regime is an ongoing and important field of research.

Future work.

Data generation is not always homogeneous: the distributions that users sample from are not truly identical; rather, it is more likely they are sampling from n distributions that could be similar or very far apart. User-level locally private distribution learning under limited heterogeneity is touched upon in [8], but we mirror their remark that this deserves further study.

A further heterogeneity that should be considered is the case when not all users hold the same number of samples m. In this case each may hold mi for each i[n]. It is not at all obvious how this could be handled neatly, and general results for this model could greatly help practical implementations.

References

  • [1] Jayadev Acharya, Clement Canonne, Cody Freitag, and Himanshu Tyagi. Test without trust: Optimal locally private distribution testing. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 2067–2076. PMLR, 16–18 April 2019. URL: http://proceedings.mlr.press/v89/acharya19b.html.
  • [2] Jayadev Acharya, Clément L. Canonne, Cody Freitag, Ziteng Sun, and Himanshu Tyagi. Inference under information constraints III: local privacy constraints. IEEE J. Sel. Areas Inf. Theory, 2(1):253–267, 2021. doi:10.1109/JSAIT.2021.3053569.
  • [3] Jayadev Acharya, Clément L. Canonne, Yuhan Liu, Ziteng Sun, and Himanshu Tyagi. Interactive inference under information constraints. CoRR, abs/2007.10976, 2020. arXiv:2007.10976.
  • [4] Jayadev Acharya, Clément L. Canonne, Yuhan Liu, Ziteng Sun, and Himanshu Tyagi. Distributed estimation with multiple samples per user: Sharp rates and phase transition. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.
  • [5] Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints I: Lower bounds from chi-square contraction. IEEE Trans. Inform. Theory, 66(12):7835–7855, 2020. Preprint available at arXiv:abs/1812.11476. doi:10.1109/TIT.2020.3028440.
  • [6] Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints II: Communication constraints and shared randomness. IEEE Transactions on Information Theory, 2020. In press. Preprint available at arXiv:abs/1804.06952. doi:10.1109/TIT.2020.3028439.
  • [7] Jayadev Acharya, Clément L. Canonne, Yanjun Han, Ziteng Sun, and Himanshu Tyagi. Domain compression and its application to randomness-optimal distributed goodness-of-fit. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 3–40. PMLR, 09–12 July 2020. URL: http://proceedings.mlr.press/v125/acharya20a.html.
  • [8] Jayadev Acharya, Yuhan Liu, and Ziteng Sun. Discrete distribution estimation under user-level local differential privacy. In AISTATS, volume 206 of Proceedings of Machine Learning Research, pages 8561–8585. PMLR, 2023. URL: https://proceedings.mlr.press/v206/acharya23a.html.
  • [9] Jayadev Acharya and Ziteng Sun. Communication complexity in locally private distribution estimation and heavy hitters. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 51–60, Long Beach, California, USA, 09–15 June 2019. PMLR. URL: http://proceedings.mlr.press/v97/acharya19c.html.
  • [10] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Differentially private testing of identity and closeness of discrete distributions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6878–6891. Curran Associates, Inc., 2018. URL: http://papers.nips.cc/paper/7920-differentially-private-testing-of-identity-and-closeness-of-discrete-distributions.pdf.
  • [11] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 1120–1129. PMLR, 16–18 April 2019. URL: http://proceedings.mlr.press/v89/acharya19a.html.
  • [12] Maryam Aliakbarpour, Arnav Burudgunte, Clément L. Canonne, and Ronitt Rubinfeld. Better private distribution testing by leveraging unverified auxiliary data. In COLT, volume 291 of Proceedings of Machine Learning Research, pages 22–63. PMLR, 2025. URL: https://proceedings.mlr.press/v291/aliakbarpour25a.html.
  • [13] Maryam Aliakbarpour, Ilias Diakonikolas, Daniel Kane, and Ronitt Rubinfeld. Private testing of distributions via sample permutations. In NeurIPS, pages 10877–10888, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/8e036cc193d0af59aa9b22821248292b-Abstract.html.
  • [14] Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. Differentially private identity and equivalence testing of discrete distributions. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 169–178, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018. PMLR. URL: http://proceedings.mlr.press/v80/aliakbarpour18a.html.
  • [15] Kareem Amin, Matthew Joseph, and Jieming Mao. Pan-private uniformity testing. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 183–218. PMLR, 09–12 July 2020. URL: http://proceedings.mlr.press/v125/amin20a.html.
  • [16] Apple Differential Privacy Team. Learning with Privacy at Scale, 2017. URL: https://machinelearning.apple.com/research/learning-with-privacy-at-scale.
  • [17] Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. CoRR, abs/2004.09481, 2020. arXiv:2004.09481.
  • [18] Bryan Cai, Constantinos Daskalakis, and Gautam Kamath. Priv’it: Private and sample efficient identity testing. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 635–644. JMLR, Inc., 2017. URL: http://proceedings.mlr.press/v70/cai17a.html.
  • [19] Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In SODA, pages 321–336. SIAM, 2021. doi:10.1137/1.9781611976465.21.
  • [20] Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Testing Bayesian Networks. In Satyen Kale and Ohad Shamir, editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 370–448, Amsterdam, Netherlands, 07–10 July 2017. PMLR. URL: http://proceedings.mlr.press/v65/canonne17a.html.
  • [21] Clément L. Canonne, Gautam Kamath, Audra McMillan, Jonathan Ullman, and Lydia Zakynthinou. Private identity testing for high-dimensional distributions. In Advances in Neural Information Processing Systems 33, 2020. To appear. Preprint available at arXiv:abs/1905.11947.
  • [22] Clément L. Canonne and Hongyi Lyu. Uniformity testing in the shuffle model: Simpler, better, faster. In Karl Bringmann and Timothy M. Chan, editors, 5th Symposium on Simplicity in Algorithms, SOSA@SODA 2022, Virtual Conference, January 10-11, 2022, pages 182–202. SIAM, 2022. doi:10.1137/1.9781611977066.13.
  • [23] Clément L. Canonne and Joy Qiping Yang. Simpler distribution testing with little memory. In Merav Parter and Seth Pettie, editors, 2024 Symposium on Simplicity in Algorithms, SOSA 2024, Alexandria, VA, USA, January 8-10, 2024, pages 406–416. SIAM, 2024. doi:10.1137/1.9781611977936.37.
  • [24] Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022. doi:10.1561/0100000114.
  • [25] Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016. IEEE Computer Society, 2016. doi:10.1109/FOCS.2016.78.
  • [26] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, volume 3876 of Lecture Notes in Comput. Sci., pages 265–284. Springer, Berlin, 2006. doi:10.1007/11681878_14.
  • [27] Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2013. doi:10.1561/0400000042.
  • [28] Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. Boosting and Differential Privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51–60, 2010. ISSN: 0272-5428. doi:10.1109/FOCS.2010.12.
  • [29] Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. Electronic Colloquium on Computational Complexity (ECCC), 23:15, 2016. URL: http://eccc.hpi-web.de/report/2016/015.
  • [30] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000.
  • [31] Clement C. (https://math.stackexchange.com/users/75808/clement c). Tail bounds for maximum of sub-gaussian random variables. Mathematics Stack Exchange, 2023. URL:https://math.stackexchange.com/q/4002713 (version: 2023-12-21). arXiv:https://math.stackexchange.com/q/4002713.
  • [32] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM J. Comput., 40(3):793–826, 2011. doi:10.1137/090756090.
  • [33] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. What can we learn privately? SIAM J. Comput., 40(3):793–826, 2011. doi:10.1137/090756090.
  • [34] Ryan O’Donnell. Analysis of Boolean functions. Cambridge University Press, New York, 2014. doi:10.1017/CBO9781139814782.
  • [35] Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008. doi:10.1109/TIT.2008.928987.
  • [36] Or Sheffet. Locally private hypothesis testing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4612–4621, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018. PMLR. URL: http://proceedings.mlr.press/v80/sheffet18a.html.
  • [37] Berrett Thomas and Butucea Cristina. Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, pages 3164–3173, Red Hook, NY, USA, 2020. Curran Associates Inc.
  • [38] Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965. doi:10.1080/01621459.1965.10480775.