Abstract 1 Introduction 2 Preliminaries 3 Main Lower Bound 4 Upper Bounds Trade-offs 5 Conclusions References

Privacy-Computation Trade-Offs in Private Repetition and Metaselection

Kunal Talwar ORCID Apple, Cupertino, CA, USA
Abstract

A Private Repetition algorithm takes as input a differentially private algorithm with constant success probability and boosts it to one that succeeds with high probability. These algorithms are closely related to private metaselection algorithms that compete with the best of many private algorithms, and private hyperparameter tuning algorithms that compete with the best hyperparameter settings for a private learning algorithm. Existing algorithms for these tasks pay either a large overhead in privacy cost, or a large overhead in computational cost. In this work, we show strong lower bounds for problems of this kind, showing in particular that for any algorithm that preserves the privacy cost up to a constant factor, the failure probability can only fall polynomially in the computational overhead. This is in stark contrast with the non-private setting, where the failure probability falls exponentially in the computational overhead. By carefully combining existing algorithms for metaselection, we prove computation-privacy tradeoffs that nearly match our lower bounds.

Keywords and phrases:
Differential Privacy, Hyperparameter Tuning, Metaselection
Copyright and License:
[Uncaptioned image] © Kunal Talwar; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy
; Theory of computation Theory and algorithms for application domains
Related Version:
Full Version: https://arxiv.org/abs/2410.19012
Editors:
Mark Bun

1 Introduction

Randomized algorithms have probabilistic guarantees. Often one designs an algorithm that succeeds (for an appropriate notion of success) with constant probability. A standard approach to boosting the success probability of a randomized algorithm is to run the algorithm with fresh randomness multiple times and take the best of the results of individual runs. Standard tail inequalities then allow for proving bounds on the success probability of the resulting algorithm. Suppose that an algorithm 𝒜 produces an output with quality score at least m with probability at least 12.111Without loss of generality, we assume our goal is to maximize a quality score; our results can be translated immediately to a setting where want to minimize a quality score. The constant 12 can be replaced by any other constant. As an example, the algorithm of [13] for finding a maximum cut in the graph finds a cut with value within a small constant factor of the optimum with constant probability. Let 𝒜maxT be the algorithm that runs 𝒜 T times to get outputs o1,,oT and releases the oi with the highest quality score. The basic repetition theorem says that 𝒜maxT produces an output with quality score at least m except with probability 12T. An important aspect of this result is the relationship between the failure probability and the number of repetitions: logarithmically many repetitions suffice to make the failure probability polynomially small. In the example above, repeating the maximum cut algorithm 20 times ensures that the best cut amongst these is approximately optimal except with probability 106. Various versions of repetition theorems have been studied in literature, e.g. improving on the amount of randomness needed [19], or allowing for parallelizability in case of multiple round protocols [27, 18].

In this work, we are interested in this question when the algorithm 𝒜 is differentially private (see Section 2 for a precise definition). We assume we have a differentially private algorithm that outputs an object (e.g. an ML model) and a score (e.g. approximate accuracy on a test dataset), where the score is “high” with constant probability. Private Repetition theorems allow for boosting the success probability, while controlling the privacy cost. A simple repetition theorem uses the 𝒜maxT above. To get failure probability below γ, it would suffice to set T=log21γ. This increases the privacy parameters (e.g. the ε in (ε,δ)-DP) of the algorithm by a factor of T which may be undesirable. Liu and Talwar [20] designed and analyzed a different algorithm for private repetition that we will refer to as 𝒜LT  for the rest of introduction. The algorithm 𝒜LT  allows for a constant (3×) increase in the privacy cost, while allowing an arbitrarily small failure probability γ. However, it requires running the algorithm 𝒜 about O(1γ) times, instead of the logarithmic dependence on γ1 in the naive repetition theorem.

Figure 1: Existing and new results on Private Repetition (left) and Private Hyperparameter tuning and Metaselection (right). Green dots show the existing upper bounds 𝒜LT and 𝒜maxT discussed above, and the gray regions depict the existing excluded regions. The red line shows our new lower bounds: we exclude the full region below the red line. The lower bound nearly matches the blue dashed line that depicts our analysis of hybrid algorithms.

In other words, existing private repetition theorems either pay a log1γ overhead in the privacy cost, or a poly(1γ) overhead in the run time. A similar dichotomy exists for private hyperparameter selection, where one wants to privately select the best model amongst those resulting from different hyperparameter choices. Indeed private hyperparameter tuning is one of the standard uses of private repetition theorems  [20, 26] and the 𝒜LT  algorithm has been used in practice recently by Israel’s Ministry of Health in their release of 2014 live births microdata [17]. The private learning algorithms for a single hyperparameter setting can can be computationally expensive to run. As an example, in Hod and Canetti [17], the algorithm 𝒜 can involve training a CTGAN [33, 34, 28] using DPSGD [1] or PATE [24, 25] for a certain set of hyperparameters. In such cases, the poly(1γ) run time overhead can be quite significant.

As we show in Section 4, the two algorithms above can be combined to give a smooth trade-off between the privacy cost overhead and the computational overhead. It is natural to ask however if such a trade-off is necessary: is there a way to keep the privacy overhead to constant while paying only logarithmically in the run time?

In this work, we give a negative answer to this question. We show that for any algorithm with constant privacy overhead, the run time must necessarily be polynomial in 1γ. More generally, our lower bound shows that the trade-off between the computational overhead (denoted by T below) and the privacy overhead (denoted by c below) is nearly tight.

Theorem (Informal version of Theorem 9).

Let 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn(R×), 𝒜 is (cε,δ)-DP.

Utility: For any input d, let m be such that Pr[val((d))m]12. Then 𝒜 satisfies Pr[val(𝒜(d))m]1γ.

Runtime: 𝒜 on input d makes at most T calls to (on any inputs) in expectation.

Then for δ<O(εγlnT), TγΩ(1/c), or equivalently, cΩ(ln1γlnT).

These results are shown pictorially in Figure 1 (left). The two green filled dots denote the trade-offs achieved by 𝒜LT  and 𝒜maxT. The blue dashed line is the upper bound achieved by combining these algorithms (Section 4). The grey shaded region can be excluded by straightforward arguments. The lower bound above excludes the red striped region. In particular, it shows that for any constant c, T must be at least poly(1γ). At the other extreme, any T that is polylogarithmic in 1γ requires cΩ(ln1γlnln1γ). These results hold even when the target value m is public.

We extend our results to the setting where the target value m is achieved by the mechanism with a probability that is different from 12. More generally, in the hyperparameter tuning setting, there are K possible settings of hyperparameters, and our utility bound asks that the algorithm be competitive with the median value of the best hyperparameter setting.

Theorem (Informal version of Theorem 10).

Let 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn×[K](R×), 𝒜:Dn(R×) is (cε,δ)-DP.

Utility: For any input d, let m be such that PrPrk[K][val((d,k))m]12K. Then Pr[val(𝒜(d))m]1γ.

Runtime: 𝒜 on input d makes at most T calls to (on any inputs) in expectation.

Then for δ<O(εγlnT), TKγΩ(1c), or equivalently, cΩ(ln1γln(T/K)).

We show these results in  Figure 1 (right). As before, the two green filled dots denote the trade-offs achieved by 𝒜LT  and 𝒜maxT. Note the here 𝒜maxT is significantly worse, and the better upper bounds is achieved by combining 𝒜LT  with 𝒜maxT. As before the grey shaded region is excluded by straightforward arguments, and the new lower bound exclude the red striped region. In particular, it shows that for any constant c, T must be at least Kpoly(1γ). At the other extreme, making T/K polylogarithmic in 1γ requires cΩ(ln1γlnln1γ). As before, the lower bound holds for known target value m.

This has strong implications for private hyperparameter tuning. Indeed our results imply that any private hyperparameter tuning algorithm that boosts the success probability to (1γ) while paying only a constant overhead in privacy cost must pay a Kpoly(γ1) computational overhead.

We remark that both the upper bounds translate ε-DP algorithms to cε-DP algorithms, and (ε,δ)-DP algorithms to (cε,Tδ)-DP algorithms; similar results hold for Renyi DP algorithms as well [26]. Our lower bounds apply even when the input algorithm is ε-DP, and the output algorithm is allowed to be (ε,δ)-DP (or Renyi DP).

Other Related Work.

Gupta et al. [14] first proved a repetition theorem for the case when the target threshold is known and the value is a low-sensitivity function. Their result required T=O(1/γ2) for repetition and T=O(K2/γ2) for metaselection. As discussed, the random stopping approach of [20] improves these to T=O(1/γ) and T=O(K/γ) without any restrictions. Subsequently,  [26] studied these questions under Renyi Differential privacy, and showed positive results for a variety of distributions of stopping times.

There is also a beautiful line of work on meta-algorithms that run multiple private algorithms iteratively, and only pay for privacy cost of those that meet a certain critrea. The Sparse Vector Technique [10, 29, 15, 32] is the simplest version of such a meta-algorithm, where one only pays privacy cost for (numerical, bounded-sensitivity) queries whose answer is above a threshold. This paradigm was significantly expanded by Cohen and Lyu [6] who show a similar result for general target sets, under appropriate assumptions on the algorithms.

Decision vs. Optimization.

Repetition theorems in complexity theory are normally stated for decision problems. This is justified by the fact that one can reduce optimization problems to decision problems with logarithmic (in the range) overhead. However a logarithmic overhead in privacy cost would be quite significant as one typically wants ε to be a small constant. For this reason, all the above works [14, 20, 26] directly address optimization problems. As decision problems can be reduced to optimization, our results also apply to decision problems.

The need for Private Repetition.

Private algorithms come in many flavors. In many cases, the algorithm itself or a slight modification can be better analyzed to make its failure probability small, at a small cost to the utility criteria. For example, the Laplace mechanism [9] and the exponential mechanism [21] can directly give utility bounds that hold with probability (1γ) for any γ. In these cases, we do not need private repetition algorithms that use a base algorithm as an oracle. However, for many other private algorithms, we do no have such sliding scale guarantees and can only control the expected utility, or median utility. This is often the case when the randomization plays a role in “non-private” part of the algorithm, e.g. in the use of tree embeddings [14, 12], hashing [3] or other more custom analyses [14, 7]. Finally, for some problems, (private) algorithms work well in practice but may not have theoretical guarantees (or work much better than their theoretical guarantees) and one does not have a knob to tune the failure probability. For these two kinds of algorithms, private repetition theorems are an invaluable tool for reducing the failure probability.

Metaselection and hyperparameter tuning algorithms are even more broadly useful. It is fairly common to have algorithms that take a hyperparameter as input and work well either provably or empirically. As an example, algorithms in the Propose-test-release framework [8] work well when the proposal is true for the distribution. In cases when the proposal can be parameterized (e.g. in [5]), it becomes a hyperparameter. Practical private optimization algorithms (e.g. [1]) typically involve multiple hyperparameters. Often the hyperparameters cannot be well-tuned on public proxy datasets, and can be a source of leakage of private information [26]. Private Hyperparameter tuning algorithms are practically used in such settings and the computational cost can be a significant concern.

2 Preliminaries

Differential Privacy [9] constrains the distribution of outcomes on neighboring datasets to be close. The Hamming distance |dd|H between two datasets d,d𝒟n is the number of entries in which they differ. For our purposes, two datasets d,d𝒟n are neighboring if they differ in one entry, i.e. if their Hamming distance is 1.

Definition 1.

A randomized algorithm :𝒟n is (ε,δ)-differentially private (DP) if for any pair of neighboring datasets d,d and any S,

Pr[(d)S] eεPr[(d)S]+δ.

We abbreviate (ε,0)-DP as ε-DP and refer to it as pure differential privacy.

Sometimes it is convenient to define a mechanism on a subset of inputs, and establish its privacy properties on this restricted subset. The following extension lemma due to Borgs et al. [4] shows that any such partial specification can be extended to the set of all datasets at a small increase in the privacy cost.

Proposition 2 (Extension Lemma).

Let B be an ε-differentially private algorithm restricted to BDn so that for any d,dB, and any SR, Pr[B(d)S]Pr[B(d)S]exp(ε|dd|H). Then there exists a randomized algorithm defined on the whole input space D which is 2ε-differentially private and satisfies that for every dB, (d) has the same distribution as B(d).

We will be interested in private mechanisms that output a tuple (o,v)o× where o is some arbitrary output space. For example, if the mechanism is computing an approximate maximum cut in a graph, o could be a subset of vertices of a graph and v the (approximate) number of edges in the input graph that leave this subset o. In the case of hyperparameter tuning, o would be a model and v an estimate of its accuracy. We denote by val((d)) the real value v when (d)=(o,v). For a mechanism :Dno×, we let Median((d))=sup{z:Pr(o,v)(d)[vz]12} denote the median of v when (o,v)(d). A private amplification algorithm takes as input a dataset d and a failure probability γ, and given oracle access to a mechanism , outputs an (o,v) such that vMedian((d)) with probability (1γ). Here this probability is taken over both the internal randomness of the amplification algorithm as well as the randomness in . A private amplification algorithm may need to make T oracle calls to , and may be cε-DP whenever is ε-DP. Our goal is to understand the trade-off between the computational overhead T and the privacy overhead c.

We will assume that 𝒜 can only output an (o,v) tuple that is produced by a run of on some input. This is a natural restriction in all settings, and is needed to make the problem meaningful.222E.g., a trivial algorithm that computes (o,v)(d) and outputs (o,) would not be useful in any application. Thus in this paper, we will always make the assumption that the oracle algorithm 𝒜 selects a tuple (o,v) from the outputs it receives from calls to .

A private metaselection algorithm operates on a set of K private algorithms 1,,K, where each i:𝒟no×. It takes as input a dataset d and a failure probability γ, and given oracle access to mechanisms i, outputs an (o,v) such that vmaxiMedian(i(d)) with probability (1γ). As above, the probability is taken over both the internal randomness of the metaselection algorithm as well as the randomness in i’s. The metaselection algorithm may need to make T oracle calls to the i’s, and may be cε-DP whenever each i is ε-DP. Our goal as before is to understand the trade-off between the computational overhead T and the privacy overhead c. Note that hyperparameter tuning can be phrased as metaselection, by treating the private training algorithm for each setting of hyperaprameters as a separate algorithm amongst K options. More generally, we can phrase metaselection and hyperparameter tuning as special cases of repetition, where the target value is the (112K)th quantile of the distribution, instead of the median. This variant can handle the case when the set of hyperparameter settings may be large (or even unbounded) but we expect a non-trivial measure of the hyperparameter settings to be good.

Proposition 3.

Let β>0 and 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn(R×), 𝒜 is (cε,δ)-DP.

Utility: For any input d, let m be such that Pr[val((d))m]β. Then 𝒜 satisfies Pr[val(𝒜(d))m]1γ.

Runtime: 𝒜 on input d makes at most T=T(c,β) calls to (on any inputs) in expectation.

Let H be a hyperparameter space, and let μH be a measure on H. Then there is an oracle algorithm 𝒜~ that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn×H(R×), 𝒜~ is (cε,δ)-DP.

Utility: For any input d, let m be such that PrhμH[val((d))m]β. Then Pr[val(𝒜~(d))m]1γ.

Runtime: 𝒜~ on input d makes at most T=T(c,β) calls to (on any inputs) in expectation.

In particular, when H=[K] and μH is uniform, then 𝒜~ competes with the largest median value maxk[K]Median(val((d,k))) and makes T=T(c,12K) calls to .

Proof.

Define that on input d, samples a random hμH and runs (d,h). Running 𝒜 with gives the first result. The second result uses the fact that the output of when H=[K] and μ is uniform is at least the largest median with probability at least 12K.

Differential privacy constrains the likelihood of events on neighboring datasets. This also implies constraints for datasets at some bounded distance. The following forms of these constraints will be useful. The proofs of the following consequences of group privacy [11] are elementary.

Proposition 4.

Let satisfy ε-DP. Then for datasets d,d such that |dd|HΔ, and any event E,

Pr[(d)E] eεΔPr[(d)E].

In particular, if Pr[(d)E]14, then Pr[(d)E]eεΔ/4.

Proposition 5.

Let satisfy (ε,δ)-DP. Then for data sets d,d such that |dd|H=Δ, and any event E,

Pr[(d)E] eεΔ(Pr[(d)E]+Δδ).

In particular, if δ<eεΔ/8Δ and Pr[(d)E]14, then Pr[(d)E]eεΔ/8.

A similar statement holds for Renyi differential privacy.

Proposition 6.

Let satisfy (α,ε)-Renyi DP for α>1. Then for data sets d,d such that |dd|H=Δ, and any event E,

Pr[(d)E] eεi=1Δ(11α)i(Pr[(d)E])(11α)Δ.

In particular, for αΔ+1, we get

Pr[(d)E] e(α1)ε(Pr[(d)E])1/e.

It follows that for αΔ+1, if Pr[(d)E]14, then Pr[(d)E]ee(α1)ε/44.

Proof.

Let d=d0,d1,,dΔ=d be a sequence of datasets where di and di+1 are adjacent. A consequence of (α,ε)-RDP  [22, Prop. 10] is that for any event E,

Pr[(di+1)E] eε(11α)(Pr[(di)E])(11α).

This implies the base case (k=1) of the claim that for all k,

Pr[(dk)E] eεi=1k(11α)i(Pr[(d0)E])(11α)k.

Suppose that the claim holds for dk1. We now inductively prove it for dk. We write

Pr[(dk)E] eε(11α)(Pr[(dk1)E])(11α)
eε(11α)(eεi=1k1(11α)i(Pr[(d0)E])(11α)k1)(11α)
=eεi=1k(11α)i(Pr[(d0)E])(11α)k.

This completes the proof of the first part. For the second part, we use the following two facts. Firstly, for αΔ+1, (11α)Δ>1e, so that the exponent of (Pr[(d0)E]) is at least (1/e) (when k=Δ). Secondly, the geometric series i=1Δ(11α)ii=1(11α)i=α1. The third part follows immediately by rearrangement.

3 Main Lower Bound

Theorem 7.

Let 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn(R×), 𝒜 is (cε,δ)-DP.

Utility: For any input d, Pr[val(𝒜(d))Median(val((d)))]1γ.

Runtime: 𝒜 on input d makes at most T calls to on input d in expectation for some Teε.

Then for δ<εγ4ln4T, T(8γ)14c/4, or equivalently, cln18γ2ln4T.

Proof.

With some foresight, we set Δ=ln4Tε, and set q=eεΔ14T. Consider datasets in {0,1}Δ where d0=0Δ, and d1=1Δ. Now we define a mechanism such that

(d1)={(r,1) w.p. 1q(r,0) w.p. q  and (d0)={(r,1) w.p. q(r,0) w.p. 1q.

Here r and r are two arbitrary distinct elements of R. Note that this specification on {d0,d1} satisfies ε-DP as 1qq1q=eε|d0d1|H. By the extension lemma (Proposition 2), this partial mechanism can be extended to a 2ε-DP mechanism on {0,1}Δ.

Consider a run of 𝒜(d0). Let E be the event that 𝒜 on input d0 makes at most 2T calls to (d0). By Markov’s inequality, Pr[E]12. Conditioned on this event E, the probability that (r,1) is the output of any of the at most 2T runs of (d0) is at most 2qT12. Since 𝒜 has never seen a (r,1) output in this case, it follows that Pr[𝒜(d0)=(r,1)E]12, so that Pr[𝒜(d0)=(r,0)E]12. It follows that Pr[𝒜(d0)=(r,0)]14.

As is 2ε-DP, the privacy property of 𝒜 implies that it is (2cε,δ)-DP. By Proposition 5, it follows that for δ<e2cεΔ/8Δ,

Pr[𝒜(d1)=(r,0)] e2cεΔ/8.

By the utility property, the left hand side is at most γ. Since e2cεΔe4cln4T, it follows that γ18(4T)4c. Rearranging, we get the claimed result. By using Proposition 6 in lieu of Proposition 5, we get a similar result for Renyi DP. We omit the straightforward proof, and similar corollaries of Theorem 9 and Theorem 10.

Corollary 8.

In  Theorem 7, if we replace the Privacy condition by

Privacy (RDP): For any ε-DP mechanism :Dn(R×), 𝒜 is (α,cε)-RDP.

Then if α1+lnTε, T(44γ)12ec/4, or equivalently, cln144γ2eln4T.

3.1 Allowing more general meta-algorithms

One of the assumptions in Theorem 7 is that when 𝒜 is run on input d, its oracle calls to are also on input d. A more general oracle algorithm may, given input d, run on additional inputs derived from d. We next show that the lower bound continues to hold, up to constants. At a high level, the lower bound in Theorem 7 comes from the inability of 𝒜 on input d0 to see any sample that is unlikely in (d0) but likely in (d1). We show how to embed such an instance in a mechanism over a slightly larger dataset, in a way where on input (d0), finding an output that is good for (d1) is still difficult.

Theorem 9.

Let 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn(R×), 𝒜 is (cε,δ)-DP.

Utility: For any input d, Pr[val(𝒜(d))Median(val((d)))]1γ.

Runtime: 𝒜 on input d makes at most T calls to (on any inputs) in expectation for some Teε.

Then for ε3, δ<εγ40ln8T, T(8γ)140c/8, or equivalently, cln18γ40ln8T.

Proof.

With some foresight, we set Δ=ln8Tε, and set q=eεΔ18T. Fix a vector v{0,1}10Δ, and let Bv={u:|uv|HΔ} be the radius Δ Hamming ball centered at v. We now define a mechanism v such that:

v(v)={(r,1) w.p. 1q(r,0) w.p. q  and uBv:v(u)={(r,1) w.p. q(r,0) w.p. 1q.

As before, r and r are two arbitrary distinct elements of R. Note that this specification on {v}Bvc satisfies ε-DP as 1qq1qeε|uv|H for uBv. By the extension lemma (Proposition 2), this partial mechanism can be extended to a 2ε-DP mechanism on {0,1}10Δ.

Consider a run of 𝒜v(d0) where d0=010Δ. Let E be the event that 𝒜v on input d0 makes at most 2T calls to on any inputs. By Markov’s inequality, Pr[E]12. For v chosen uniformly at random, the probability that a single dataset d chosen by 𝒜v(d0) lies in Bv is given by

|Bv|210Δ=(10ΔΔ)210Δ(10e1024)Δe3Δ18T.

Thus the likelihood of seeing an (r,1) on any given call to v(d), when v is chosen at random is

Pr[v(d)=(r,1)] Pr[dBv]+Pr[v(d)=(r,1)dBv]
18T+q14T.

Here and in the rest of the proof, the probability is taken over both the choice of v and the randomness of 𝒜v and v. Conditioned on the event E, the probability that (r,1) is the output of any of the at most 2T runs of is at most 2T4T12. Since 𝒜 has never seen an (r,1) output in this case, it follows that Pr[𝒜v(d0)=(r,1)E]12, so that Pr[𝒜(d0)=(r,0)E]12. It follows that Pr[𝒜(d0)=(r,0)]14.

The rest of the proof is now essentially identical to the earlier proof. As v is 2ε-DP, the privacy property of 𝒜v implies that it is (2cε,δ)-DP. By Proposition 5, it follows that for δ<e20cεΔ/80Δ,

Pr[𝒜(d1)=(r,0)] e20cεΔ/8.

By the utility property, the left hand side is at most γ. Since e20cεΔe40cln8T, it follows that γ18(8T)40c. Rearranging, we get the claimed result.

3.2 Lower Bounds for Hyperparameter Tuning

We next show that the same essential arguments can be used to show a lower bound for private hyperparameter tuning.

Theorem 10.

Let 𝒜 be an oracle algorithm that satisfies the following properties:

Privacy: For any ε-DP mechanism :Dn×[K](R×), 𝒜:Dn(R×) is (cε,δ)-DP.

Utility: For any input d, Pr[val(𝒜(d))maxk[K]Median(val((d,k)))]1γ.

Runtime: 𝒜 on input d makes at most TK calls to (on any inputs) in expectation for some Teε.

Then for ε3, δ<εγ40ln8T, T(8γ)140c/8, or equivalently, cln18γ40ln8T.

Proof.

We set Δ=ln8Tε, and set q=eεΔ18T. For a vector v{0,1}10Δ, and let Bv={u:|uv|HΔ} be the radius Δ Hamming ball centered at v. For a parameter setting k[K], we now define a mechanism v,k such that:

v,k(v,k)={(r,1) w.p. 1q(r,0) w.p. q,
uBv: v,k(u,k)={(r,1) w.p. q(r,0) w.p. 1q
 and jk,u{0,1}10Δ: v,k(u,j)=(r,0) w.p. 1.

Here, r and r are two arbitrary distinct elements of R. By the extension lemma (Proposition 2), for each j[K], the partial mechanism v,k(,j) can be extended to a 2ε-DP mechanism on {0,1}10Δ.

Consider a run of 𝒜v,k(d0) where d0=010Δ. Let E be the event that 𝒜v,k on input d0 makes at most 2TK calls to on any inputs. By Markov’s inequality, Pr[E]12. Each call to made by 𝒜v,k(d0) consists of a single dataset d and a single j[K]. When v,k are chosen uniformly at random, the probability that dBv is

|Bv|210Δ=(10ΔΔ)210Δ(10e1024)Δe3Δ18T.

If the j chosen by 𝒜v,k(d0) is different from k, the likelihood of seeing an (r,1) is zero. Thus the likelihood of seeing an (r,1) on a given call to is

Pr[v,k(d,j)=(r,1)] Pr[j=kdBv]
+Pr[j=k]Pr[v(d)=(r,1)j=kdBv]
18TK+qK14TK.

Conditioned on the event E, the probability that (r,1) is the output of any of the at most 2TK runs of is at most 2TK4TK12. Since 𝒜 has never seen a (r,1) output in this case, it follows that Pr[𝒜v,k(d0)=(r,1)E]12, so that Pr[𝒜(d0)=(r,0)E]12. It follows that Pr[𝒜(d0)=(r,0)]14.

The rest of the proof is essentially identical. As v,k(,j) is 2ε-DP for each j, the privacy property of 𝒜v,k implies that it is (2cε,δ)-DP. By Proposition 5, it follows that for small enough δ.

Pr[𝒜v,k(v)=(r,0)] e20cεΔ/8.

By the utility property, the left hand side is at most γ. Since e20cεΔe40cln8T, it follows that γ18(8T)40c. Rearranging, we get the claimed result.

4 Upper Bounds Trade-offs

As mentioned earlier, two extreme points on the trade-off between the privacy overhead and the run time overhead are known. We show next the trade-off that is achievable by a simple combination of the basic approaches. We consider the goal of matching the median score, that is the common case for private repetition (labeled Repetition below), as well as the goal of matching the (11K)th quantile, which is the case for metaselection and hyperparameter tuning (labeled Metaselection below). We first state the results one gets from simple repetition.

Theorem 11.

Suppose that :Dn(R×) is (ε,δ)-DP. Then for an integer T, the algorithm 𝒜maxT that on input d runs (d) T times and releases the output with the highest quality score satisfies

  • Privacy: 𝒜maxT:Dn(R×) is (Tε,Tδ)-DP, as well as (Tε2+ε2Tln1δ,Tδ+δ)-DP, for any δ(0,1).

  • Runtime: 𝒜 on input d makes exactly T calls to .

  • Utility (Repetition): For any input d, if Tlog2γ, then

    Pr[val(𝒜maxT(d))Median(val((d)))]1γ.
  • Utility (Metaselection): For any input d, and K>1, if TKlnγ, then

    Pr[val(𝒜maxT(d))q11K((d))]1γ,

    where the quantile q11K is such that Pr[(d)q11K]1K.

Proof.

The privacy follows from simple composition and advanced composition applied and post-processing. The utility and run time analyses are straightforward. The following result was shown by Liu and Talwar [20].

Theorem 12.

Suppose that :Dn(R×) is ε-DP and γ>0. Then the algorithm 𝒜LT,γ satisfies

  • Privacy: 𝒜LT,γ:Dn(R×) is 3ε-DP.

  • Runtime: 𝒜LT,γ on input d makes in expectation O(1γ) calls to .

  • Utility (Repetition): For any input d, Pr[val(𝒜LT,γ(d))Median(val((d)))]1γ.

  • Utility (Metaselection): For any input d, and K>1, the algorithm 𝒜LT,γ satisfies Pr[val(𝒜LT,γ(d))q11K((d))]1Kγ, where the quantile q11K is such that Pr[(d)q11K]1K.

These two algorithms can be combined, by using Theorem 12 to boost the success probability to 1γ1c, and then repeating that algorithm similarly to Theorem 11 to further boost the success probability to (1γ).

Theorem 13.

Suppose that :Dn(R×) is ε-DP and c>1 be an integer. Then the algorithm 𝒜 that runs 𝒜maxT (for T=c) with an 𝒜LT,γ1c oracle satisfies

  • Privacy: 𝒜:Dn(R×) is 3cε-DP.

  • Runtime: 𝒜 on input d makes in expectation O(cγ1c) calls to .

  • Utility: For any input d, Pr[val(𝒜(d))Median(val((d)))]1γ.

Proof.

By Theorem 12, 𝒜LT,γ1c is 3ε-DP. Theorem 11 now implies that 𝒜 is 3cε-DP. For the chosen parameters, the utility bound in Theorem 12 implies that Pr[val(𝒜LT,γ1c(d))Median(val((d)))]1γ1c. Since this algorithm is repeated c times in 𝒜, the failure probability gets reduced to γ. Finally, the run time bound follows by combining the run time results in Theorem 12 and Theorem 11. We note that for c<logγ1/loglogγ1, cc<γ1 so that this bound of O(cγ1c) is O(γ2/c). Thus it matches the lower bound in Theorem 9 up to constant factors in c.

An identical argument yields the following result for metaselection.

Theorem 14.

Suppose that :Dn(R×) is ε-DP and let c>1 be an integer. Then the algorithm 𝒜 that runs 𝒜maxT (for t=c) with an 𝒜LT,γ1c/K oracle satisfies

  • Privacy: 𝒜:Dn(R×) is 3cε-DP.

  • Runtime: 𝒜 on input d makes in expectation O(Kcγ1c) calls to .

  • Utility: For any input d, Pr[val(𝒜(d))q11K((d))]1γ, where the quantile q11K is such that Pr[(d)q11K]1K.

Proof.

By Theorem 12, 𝒜LT,γ1c/K is 3ε-DP. Theorem 11 now implies that 𝒜 is 3cε-DP. For the chosen parameters, the utility bound in Theorem 12 implies that Pr[val(𝒜LT,γ1c(d))q11K((d))]1K(γ1c/K)=1γ1c. Since this algorithm is repeated c times in 𝒜, the failure probability gets reduced to γ. Finally, the run time bound follows by combining the run time results in Theorem 12 and Theorem 11. This result improves on 𝒜maxT in terms of privacy cost and gives the points plotted on Figure 1(right). As in the case of repetition, this result is tight up to constants in c all the way up to c=logγ1/loglogγ1.

5 Conclusions

Differentially private repetition is a fundamental problem in the design of differentially private algorithms, and private hyperparameter tuning is of great practical importance. Our work shows new trade-offs between privacy overhead and computational overhead, and our main result is a lower bound showing that there is no general algorithm that can do significantly better. Indeed we show that for constant privacy overhead, the computational overhead must be polynomial in 1γ for some private algorithms.

It is natural to ask if there are reasonable restrictions one can place on the inputs or the private algorithms of interest, for which better repetition theorems and/or hyperparameter tuning algorithms are possible. Such beyond-worst-case results are not uncommon in differential privacy [23, 8, 31, 2]. There are some assumptions that hyperparameter tuning algorithms either implicitly [30] or explicitly [16] make in the non-private setting and one may ask if such assumptions help with better trade-offs for private hyperparameter search. Even absent computational constraints, the 2× or 3× privacy cost overhead in 𝒜LT can be large. Once again, while it is unavoidable in the worst-case, one may hope to design algorithms that do better for non-worst-case instances. We leave these important questions for future work.

References

  • [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Conference on Computer and Communications Security, pages 308–318, 2016. doi:10.1145/2976749.2978318.
  • [2] Hilal Asi and John C Duchi. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 14106–14117. Curran Associates, Inc., 2020. URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/a267f936e54d7c10a2bb70dbe6ad7a89-Paper.pdf.
  • [3] Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Thakurta. Practical locally private heavy hitters. Journal of Machine Learning Research, 21(16):1–42, 2020. URL: http://jmlr.org/papers/v21/18-786.html.
  • [4] Christian Borgs, Jennifer Chayes, Adam Smith, and Ilias Zadik. Revealing network structure, confidentially: Improved rates for node-private graphon estimation. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 533–543, 2018. doi:10.1109/FOCS.2018.00057.
  • [5] Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C Perdomo, and Adam Smith. Insufficient statistics perturbation: Stable estimators for private least squares extended abstract. In Shipra Agrawal and Aaron Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 750–751. PMLR, 30 June–03 July 2024. URL: https://proceedings.mlr.press/v247/brown24b.html.
  • [6] Edith Cohen and Xin Lyu. The target-charging technique for privacy analysis across interactive computations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 62139–62168. Curran Associates, Inc., 2023. URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/c3fe2a07ec47b89c50e89706d2e23358-Paper-Conference.pdf.
  • [7] Michael Dinitz, Satyen Kale, Silvio Lattanzi, and Sergei Vassilvitskii. Almost tight bounds for differentially private densest subgraph. In Yossi Azar and Debmalya Panigrahi, editors, Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025, New Orleans, LA, USA, January 12-15, 2025, pages 2908–2950. SIAM, 2025. doi:10.1137/1.9781611978322.94.
  • [8] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, pages 371–380, New York, NY, USA, 2009. Association for Computing Machinery. doi:10.1145/1536414.1536466.
  • [9] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3):17–51, 2017. doi:10.29012/jpc.v7i3.405.
  • [10] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Rothblum, and Salil Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, pages 381–390, New York, NY, USA, 2009. Association for Computing Machinery. doi:10.1145/1536414.1536467.
  • [11] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 & 4):211–407, 2014. doi:10.1561/0400000042.
  • [12] Vitaly Feldman, Audra McMillan, Satchit Sivakumar, and Kunal Talwar. Instance-optimal private density estimation in the wasserstein distance. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 90061–90131. Curran Associates, Inc., 2024. URL: https://proceedings.neurips.cc/paper_files/paper/2024/file/a406c9f8eb70032a21110a4d86735ab9-Paper-Conference.pdf.
  • [13] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–1145, November 1995. doi:10.1145/227683.227684.
  • [14] Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 1106–1125, USA, 2010. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611973075.90.
  • [15] Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 61–70, 2010. doi:10.1109/FOCS.2010.85.
  • [16] Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: a spectral approach. In International Conference on Learning Representations, 2018. URL: https://openreview.net/forum?id=H1zriGeCZ.
  • [17] Shlomi Hod and Ran Canetti. Differentially Private Release of Israel’s National Registry of Live Births . In 2025 IEEE Symposium on Security and Privacy (SP), pages 100–100, Los Alamitos, CA, USA, May 2025. IEEE Computer Society. doi:10.1109/SP61157.2025.00101.
  • [18] Thomas Holenstein. Parallel repetition: simplifications and the no-signaling case. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, pages 411–419, New York, NY, USA, 2007. Association for Computing Machinery. doi:10.1145/1250790.1250852.
  • [19] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43:439–561, 2006.
  • [20] Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 298–309, New York, NY, USA, 2019. Association for Computing Machinery. doi:10.1145/3313276.3316377.
  • [21] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pages 94–103, USA, 2007. IEEE Computer Society. doi:10.1109/FOCS.2007.41.
  • [22] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, 2017. doi:10.1109/CSF.2017.11.
  • [23] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07, pages 75–84, New York, NY, USA, 2007. Association for Computing Machinery. doi:10.1145/1250790.1250803.
  • [24] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-supervised knowledge transfer for deep learning from private training data. In International Conference on Learning Representations, 2017. URL: https://openreview.net/forum?id=HkwoSDPgg.
  • [25] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Ulfar Erlingsson. Scalable private learning with PATE. In International Conference on Learning Representations, 2018. URL: https://openreview.net/forum?id=rkZB1XbRZ.
  • [26] Nicolas Papernot and Thomas Steinke. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2022. URL: https://openreview.net/forum?id=-70L8lpp9DF.
  • [27] Ran Raz. A parallel repetition theorem. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, pages 447–456, New York, NY, USA, 1995. Association for Computing Machinery. doi:10.1145/225058.225181.
  • [28] Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj Desai, and Joshua Allen. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537, 2020. arXiv:2011.05537.
  • [29] Aaron Roth and Tim Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC ’10, pages 765–774, New York, NY, USA, 2010. Association for Computing Machinery. doi:10.1145/1806689.1806794.
  • [30] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL: https://proceedings.neurips.cc/paper_files/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf.
  • [31] Abhradeep Guha Thakurta and Adam Smith. Differentially private feature selection via stability arguments, and the robustness of the lasso. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 819–850, Princeton, NJ, USA, 12–14 June 2013. PMLR. URL: https://proceedings.mlr.press/v30/Guha13.html.
  • [32] Salil Vadhan. The Complexity of Differential Privacy, pages 347–450. Springer, Yehuda Lindell, ed., 2017. doi:10.1007/978-3-319-57048-8_7.
  • [33] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper_files/paper/2019/file/254ed7d2de3b23ab10936522dd547b78-Paper.pdf.
  • [34] Jinsung Yoon, James Jordon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=S1zk9iRqF7.