Abstract 1 Introduction 2 Technical overview of our proof 3 Preliminaries 4 Algorithm in the Large Distance Regime References Appendix A Algorithm in the Small Distance Regime Appendix B Algorithm in the Giant Distance Regime Appendix C Reduction from Identity to Uniformity Testing

Uniformity Testing When You Have the Source Code

Clément L. Canonne ORCID The University of Sydney, Australia Robin Kothari ORCID Google Quantum AI, Santa Barbara, CA, USA Ryan O’Donnell ORCID Carnegie Mellon University, Pittsburgh, PA, USA
Abstract

We study quantum algorithms for verifying properties of the output probability distribution of a classical or quantum circuit, given access to the source code that generates the distribution. We consider the basic task of uniformity testing, which is to decide if the output distribution is uniform on [d] or ε-far from uniform in total variation distance. More generally, we consider identity testing, which is the task of deciding if the output distribution equals a known hypothesis distribution, or is ε-far from it. For both problems, the previous best known upper bound was O(min{d1/3/ε2,d1/2/ε}). Here we improve the upper bound to O(min{d1/3/ε4/3,d1/2/ε}), which we conjecture is optimal.

Keywords and phrases:
distribution testing, uniformity testing, quantum algorithms
Funding:
Clément L. Canonne: Supported by an ARC DECRA (DE230101329).
Ryan O’Donnell: Supported by ARO grant W911NF2110001 and by a gift from Google Quantum AI.
Copyright and License:
[Uncaptioned image] © Clément L. Canonne, Robin Kothari, and Ryan O’Donnell; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
; Theory of computation Quantum computation theory
Related Version:
Full Version: https://arxiv.org/abs/2411.04972
Editor:
Bill Fefferman

1 Introduction

For 30 years we have known that quantum computers can solve certain problems significantly faster than any known classical algorithm. Traditionally, most of the research in this area has focused on decision problems (like SAT) or function problems (like Factoring), where for each possible input there is a unique “correct” output. However, we have also found that quantum computers can yield speedups for the task of sampling from certain probability distributions. Prominent examples include boson sampling [1] and random circuit sampling [8]. Sampling tasks have seemed more natural for NISQ-era quantum computation, and indeed many of the first candidate experimental demonstrations of quantum advantage have been for sampling problems [6].

One of the downsides of sampling problems is the challenge of verifying the output of an algorithm, whether classical or quantum, that claims to sample from a certain distribution. As a simple example, consider a classical or quantum algorithm that implements a supposed hash function with output alphabet [d]{1,,d}. The algorithm designer claims that the output distribution of this hash function is uniform on [d]. If 𝐩 denotes the actual output distribution of the algorithm, and 𝐮d denotes the uniform distribution on [d], then we would like to test whether 𝐩=𝐮d, and reject the claim if 𝐩 is in fact ε-far from 𝐮d in total variation distance, meaning 12𝐩𝐮d1>ε. (We will also consider other distance measures in this work, since the complexity of the testing task is sensitive to this choice.)

This verification task is called “uniformity testing” (in total variation distance) and its complexity is well studied in the classical literature. If we only have access to samples from 𝐩, but are not allowed to inspect the algorithm that produces these samples, it is known that Θ(d1/2/ε2) samples are necessary and sufficient to solve this problem; there are various classical algorithms that achieve this bound (starting with that of [28]; see, e.g., [11] for a detailed survey and discussion), and it is also not possible to do better with a quantum algorithm. But what if – as in the examples above – we do have access to the algorithm that produces 𝐩? Can we improve on this complexity if we have access to the “source code” of the algorithm?

Having the source code

To clarify, the “source code” for a classical randomized sampling algorithm means a randomized circuit (with no input) whose output is one draw from 𝐩. More generally, the “source code” for a quantum sampling algorithm means a unitary quantum circuit (with all input qubits fixed to |0) which gives one draw from 𝐩 when some of its output bits are measured in the standard basis and the rest are discarded.111This is sometimes termed the “purified quantum query access model”, and is the most natural and general model. The “quantum string oracle”, referenced later in Table 1, refers to a situation in which one assumes a very specific type of source code for 𝐩 (thus making algorithmic tasks easier). See Section 3 for details and [7] for a thorough discussion. The simplest way to use the code C for 𝐩 is to run it, obtaining one sample. If C has size S, then getting one sample this way has cost S. Another way to use the code C is to deterministically compute all its output probabilities; this gives one perfect information about 𝐩, but has cost bound 2S. But quantum computing has suggested a third way to use the code: “running it in reverse”. For example, Grover’s original algorithm [18] can be seen as distinguishing two possibilities for 𝐩 on [2], namely 𝐩1=0 or 𝐩1=1/N, while using only O(N1/2) forwards/backwards executions of C. The total cost here is O(N1/2)S, the same as the cost for O(N1/2) samples.

We suggest that the utility of “having the source code” for distribution testing problems remains notably underexplored. Indeed, there is significant room for improvment in the bounds for even the most canonical of all such problems: uniformity testing. Our main theorem is the following:

Theorem 1.

There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: given ε1/d, the algorithm makes O(d1/3/ε4/3) uses of “the code” for an unknown distribution 𝐩 over [d], and distinguishes with probability at least .99 between

(1)𝐩=𝐮d,and(2)dTV(𝐩,𝐮d)>ε. (1)

The main idea behind this theorem is to combine very careful classical probabilistic analysis with a black-box use of Quantum Mean Estimation (QME) [19, 9, 21, 25, 20, 22]; see Section 2 for further discussion. Table 1 below compares our result to prior work on the problem.

Table 1: “Sample” complexity for uniformity testing with respect to total variation distance.
Reference Large ε regime Small ε regime Access model
[28, 4] Θ(d1/2/ε2) Classical, no source code
[10] O(d1/3) for ε=Θ(1) Quantum string oracle
[12] O(d1/3/ε2) Quantum string oracle
[15] O(d1/2/ε)log(d/ε)3loglog(d/ε) Source code
[24] O(d1/2/ε) Source code
This work O(d1/3/ε4/3) for ε1d Source code

*The work states a bound of O(d1/3/ε4/3), but adds that ε must be constant.

Table 1 has two columns because it seems that different algorithms are necessary depending on how d and ε relate. (Interestingly, this is not the case in the classical no-source-code model.) Thus combining our new result with that of [24], the best known upper bound becomes O(min{d1/3/ε4/3,d1/2/ε}). We remark that although [24]’s algorithm/analysis is already simple, we give an alternative simple algorithm and analysis achieving O(d1/2/ε) in Appendix A, employing the classical analysis + QME approach used in the proof of our main theorem.

Lower bounds?

As for lower bounds (holding even in the quantum string oracle model): complexity Ω(1/ε) is necessary even in the case of constant d=2, following from work of [26]; and, [12] showed a lower bound of Ω(d1/3) even in the case of constant ε, by reduction from the collision problem [2]. For reasons discussed in Section 2, we make the (somewhat bold) conjecture that our new upper bound is in fact tight for all d and ε:

Conjecture 2.

Any algorithm that distinguishes 𝐩=𝐮d from dTV(𝐩,𝐮d)>ε with success probability at least .99 requires Ω(min{d1/3/ε4/3,d1/2/ε}) uses of the code for 𝐩. (Moreover, we conjecture this lower bound in the stronger quantum string oracle model.)

Identity testing

Several prior works in this area have also studied the following natural generalization of uniformity testing: testing identity of the unknown distribution 𝐩 to a known hypothesis distribution 𝐪. An example application of this might be when 𝐪 is a Porter–Thomas-type distribution arising as the ideal output of a random quantum circuit. Luckily, fairly recent work has given a completely generic reduction from any fixed identity testing problem to the uniformity testing problem; see [16], or [11, Section 2.2.3]. We can therefore immediately extend our new theorem to the general identity-testing setting:

Corollary 3.

There is a computationally efficient quantum algorithm for identity testing to a reference distribution 𝐪 over [d] with the following guarantees: The algorithm makes O(min(d1/3/ε4/3,d1/2/ε)) uses of “the code” for an unknown distribution 𝐩 over [d], and distinguishes with probability at least .99 between

(1)𝐩=𝐪,and(2)dTV(𝐩,𝐪)>ε. (2)

(For completeness, we verify in Appendix C that the blackbox reduction does indeed carry through in our setting, preserving access to “the code”.)

More fine-grained results

In proving our main theorem, we will in fact prove a strictly stronger version, one which is more fine-grained in two ways:

  1. (1)

    Tolerance: Not only does our test accept with high probability when 𝐩=𝐮d, it also accepts with high probability when 𝐩 is sufficiently close to 𝐮d.

  2. (2)

    Stricter distance measure. Not only does our test reject with high probability when dTV(𝐩,𝐮d)>ε, it also rejects with high probability when dH(𝐩,𝐮d)>ε. (This is stronger, since dTV(𝐩,𝐪)dH(𝐩,𝐪) always.)

To elaborate, recall the below chain of inequalities, which also includes KL- and χ2-divergence. (We review probability distance measures in Section 3.)

dTV(𝐩,𝐪)2dH2(𝐩,𝐪)KL(𝐩𝐪)χ2(𝐩𝐪). (3)

The strictly stronger version of Theorem 1 that we prove is:

Theorem 4.

There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: For 1/dθ1, the algorithm makes O(d1/3/θ2/3) uses of “the code” for an unknown distribution 𝐩 over [d], and distinguishes with probability at least .99 between

(1)χ2(𝐩𝐮d).99θand𝐩100/d,and(2)dH2(𝐩,𝐮d)>θ. (4)

We remark that most prior works on uniformity testing [10, 12, 15, 24] also had some additional such fine-grained aspects, beyond what is stated in Table 1.

Additional results

Speaking of χ2-divergence, we mention two additional results we prove at the end of our work. These results additionally inform our Conjecture 2.

First, as mentioned earlier, in Appendix A we give an alternative proof of the O(d1/2/ε) upper bound of [24], and – like in that work – our result is tolerant with respect to χ2-divergence. That is, we prove the strictly stronger result that for θ1/d, one can use the code O(d1/2/θ1/2) times to distinguish χ2(𝐩𝐮d)cθ from χ2(𝐩𝐮d)>θ (for some constant c>0).

Second, recall that χ2(𝐩𝐮d) can be as large as d. For example, χ2(𝐮S𝐮d)=dr1 for any set S[d] of size r. Thus it makes sense to consider the uniformity testing problem even with respect to a χ2-divergence threshold θ that exceeds 1. In Appendix B we show (albeit only in the quantum string oracle model) that for θ1, one can use the code O(d1/3/θ1/3) times to distinguish χ2(𝐩𝐮d)cθ from χ2(𝐩𝐮d)>θ, and this is optimal.

2 Technical overview of our proof

Our main algorithm is concerned with achieving the best possible ε-dependence for uniformity testing while maintaining a d-dependence of d1/3; in this way, it is best compared with the older works of [10, 12], the latter of which achieves complexity O(d1/3/ε2), as well as the classical (no-source-code) algorithm achieving complexity O(d1/2/ε2). In fact, all four algorithms here are almost the same (except in terms of the number of samples they use). Let us describe our viewpoint on this common methodology.
We consider the algorithm as being divided into two Phases, and we may as well assume each Phase uses n samples. Phase 1 will have two properties:

  • It will make n black-box draws from 𝐩 (i.e., the source code is not used in Phase 1).

  • Using these draws, Phase 1 will end by constructing a certain “random variable” – in the technical sense of a function Y:[d].

  • The mean of this random variable Y, vis-a-vis the unknown distribution 𝐩, will ideally be close to χ2(𝐩𝐮)=d𝐩𝐮d22. That is, ideally μ𝔼𝐩[Y]=j=1d𝐩jY(j)χ2(𝐩𝐮d).

Phase 2 then performs a mean estimation algorithm on Y (vis-a-vis 𝐩) to get an estimate of μ and therefore of χ2(𝐩𝐮d). Ideally, the resulting overall algorithm is not just a uniformity tester, but a χ2-divergence-from-uniformity estimator. This could then be weakened to a TV-distance uniformity tester using the inequality dTV(𝐩,𝐮d)2χ2(𝐩𝐮d).

The mean estimation algorithm used in Phase 2 differs depending on whether one has the source code or not. In the classical (no source code) model, one simply uses the naive mean estimation algorithm based on n more black-box samples; by Chebyshev’s inequality, this will (with high probability) give an estimate of μ to within ±O(σ/n1/2), where σstddev𝐩[Y]=j=1d(Y(j)μ)2. In the case of a quantum tester with the source code access, we can use a Quantum Mean Estimation (QME) routine; in particular, the one from [22] will (with high probability) yield an estimate of μ to within ±O(σ/n).222This QME routine was not available at the time of [10, 12] which had to make do with Quantum Approximate Counting [9] – essentially, QME for Bernoulli random variables. But this is not the source of our improvement; one can obtain our main theorem with only a (polylogd)-factor loss using just Quantum Approximate Counting.

A subtle aspect of this overall plan is that the mean μ and standard deviation σ of Y are themselves random variables (in the usual sense), where the randomness comes from Phase 1. Thus it is natural to analyze 𝔼Phase 1[μ] and 𝔼Phase 1[σ]. Of course, these depend on the definition of Y, which we now reveal: Y(j)=dnXj1, where Xj denotes the number of times j[d] was drawn in Phase 1. The point of this definition of Y is that a short calculation implies

𝔼Phase 1[μ]=χ2(𝐩𝐮d); (5)

that is, the random variable μ is an unbiased estimator for our quantity of interest, the χ2-divergence of 𝐩 from 𝐮d. This is excellent, because although the algorithm does not see μ at the end of Phase 1, it will likely get a good estimate of it at the end of Phase 2…so long as (the random variable) σ is small.

We therefore finally have two sources of uncertainty about our final error (in estimating χ2(𝐩𝐮d)):

  1. 1.

    Although 𝔼Phase 1[μ]=χ2(𝐩𝐮d), the random variable μ may have fluctuated around its expectation at the end of Phase 1. One way to control this would be to bound VarPhase 1[μ] (and then use Chebyshev).

  2. 2.

    The Phase 2 mean estimation incurs an error proportional to σ. One way to control this would be to bound 𝔼Phase 1[σ2] (and then use Markov to get a high-probability bound on σ2, and hence σ).

The quantities controlling the error here, VarPhase 1[μ] and 𝔼Phase 1[σ2], are explicitly calculable symmetric polynomials in 𝐩1,,𝐩d of degree at most 4, depending on n. In principle, then, one can relate these quantities to χ2(𝐩𝐮d)=d𝐩𝐮d22 itself, and derive a bound on how big n must be to (with high probability) get a good estimate of χ2(𝐩𝐮d).

In the classical (no source code) case, this methodology is a way to obtain the O(d1/2/ε2) sample complexity, adding to the number of existing classical sample-optimal algorithms for the task. (This method in particular has some potential useful applications; e.g., one could consider decoupling the number of samples used in Phases 1 and 2 to, e.g., obtain tradeoffs for memory-limited settings). On one hand, with this method one can give a very compressed proof of the O(d1/2/ε2) that, factoring out routine calculations, fits in half a page (see, e.g., [27, Sec. 10]). On the other hand, one has to execute the calculations and estimations with great care, lest one would obtain a suboptimal result (there is a reason it took 8 years333Technically, it took more than 8 years, as the proof of [28] was later shown to have a flaw: so the tight dependence had to wait until [4]. See [11, Section 2.3] for a discussion. to get the optimal quadratic dependence on ε [17, 28]).

In the case when source code is available, so that one can use the QME algorithm, how well does this methodology fare? On one hand, QME gives a quadratic improvement over naive classical mean estimation, meaning one can try to use signficantly fewer samples in Phase 2. But when one balances out the sample complexity between the two Phases, it implies one is using fewer samples in Phase 1, and hence one gets worse concentration of μ around its mean in Phase 1. So the calcuations become more delicate.

2.1 Heuristic calculations

Instead of diving into complex calculations, let’s look at some heuristics. First, let’s consider how the algorithm proceeds in the case when 𝐩 really is the uniform distribution 𝐮d. In this case, as long as we’re in a scenario where nd1/2, we will likely get all distinct elements in Phase 1, meaning that Xj will be 1 for exactly n values of j and Xj will be 0 otherwise. Then Y(j) will be dn1 for n values of j and will be 1 otherwise. This indeed means μ=𝔼𝐩[Y]=1dj=1dY(j)=0=𝐩𝐮d2 with certainty in Phase 1. This is very good; we get no error out of Phase 1. However QME in Phase 2 will not perfectly return the value μ=0; rather, it will return something in the range ±O(σ/n), where σ=1dj=1d(Y(j)0)2=dn1d1/2/n1/2. Thus the value returned by QME may well be around d1/2/n3/2, which from the algorithm’s point of view is consistent with χ2(𝐩𝐮d)d1/2/n3/2. Thus the algorithm will only become confident that dTV(𝐩,𝐮d)2d1/2/n3/2, and hence it can only confidently accept in the case 𝐩=𝐮d provided d1/2/n3/2ε2; i.e., nd1/3/ε4/3. We thereby see that with this algorithm, a uniformity testing upper bound of O(d1/3/ε4/3) is the best we can hope for. If one also believes that this algorithm might be optimal (and it has been the method of choice for essentially all previously known results), then this could possibly be taken as evidence for our Conjecture 2.

At this point, one might try to prove that complexity O(d1/3/ε4/3) is achievable; so far we have only argued that with this many samples, the algorithm will correctly accept when 𝐩=𝐮d (with high probability). Again, before jumping into calculations, one might try to guess the “hardest” kind of ε-far distributions one might face, and try to work out the calculations for these cases. The hardest distributions in the classical case (i.e., the ones that lead to the matching Ω(d1/2/ε2) lower bound) are very natural: they are the 𝐩’s in which half of the elements j[d] have 𝐩j=1+2εd and half have 𝐩j=12εd. Assuming this is the “worst case”, one can calculate what VarPhase 1[μ] and 𝔼Phase 1[σ2] will be, and the calculations turn out just as desired. That is, with n=O(d1/3/ε4/3), these two error quantities can be shown to be suffciently small so that the overall algorthm will correctly become confident that χ2(𝐩𝐮d)=d𝐩𝐮d224ddTV(𝐩,𝐮d)2 significantly exceeds ε2, and hence the algorithm can correctly reject.

Everything therefore looks good, but there is a fly in the ointment. Even though this particular 𝐩 with its values of 1±2εd seems like the “hardest” distribution to face, one still has to reason about all possible 𝐩’s with dTV(𝐩,𝐮d). And when one does the calculations of Var[μ] and 𝔼[σ2] as prescribed by the standard methodology, the plan ends up failing. Specifically one gets too much error for 𝐩’s that have somewhat “heavy” elements, meaning 𝐩j’s with 𝐩j1/d. The prior works [10, 12] cope with this failure by taking more samples; i.e., setting n=O(d1/3/εc) for c>4/3 (specifically, [12] achieves c=2). But our goal is to show that this is unnecessary – that the algorithm itself works, even though the standard and natural way of analyzing it fails.

In short, the reason the standard analysis of the algorithm fails is due to “rare events” that are caused by heavy elements in 𝐩. These j’s with 𝐩j1/d may well still have 𝐩j1/n (for our desired n=O(d1/3/ε4/3)), and thus be drawn only rarely in Phase 1. The major difficulty is that when they are drawn, they generate a very large contribution to σ2, causing 𝔼Phase 1[σ2] to be “misleadingly large”. That is, when there are heavy elements, σ2 may have the property of typically being much smaller than its expectation. Thus controlling the QME error using the expected value of σ2 is a bad strategy.

Perhaps the key insight in our analysis is to show: In those rare Phase 1 outcomes when σ2 is unusually large, μ is also unusually large compared to its expectation. The latter event is helpful, because if μ ends up much bigger than its expectation, we can tolerate a correspondingly worse error-bar from QME. In short, we show that the rare bad outcomes for σ2 coincide with the rare good outcomes for μ.

In order to make this idea work out quantitatively, we (seem to) need to weaken our ambitions and get something a bit worse than a χ2-divergence-from-uniform estimation algorithm, in two ways. (This is fine, as our main goal is just a non-tolerant uniformity tester with respect to TV.) First, rather than insisting that we accept with high probability when χ2(𝐩𝐮d).99θ and reject with high probability when χ2(𝐩𝐮d)>θ, we need to only require rejection when dH2(𝐩,𝐮d)>θ. The reason is that the rare large values of σ2 that we face are only comparable with the larger value dH2(𝐩,𝐮d), and not with χ2(𝐩𝐮d).444We remark that this χ2-versus-Hellinger-squared dichotomy is quite reminsicent of the one that occurs in classical works on identity testing, such as [4].

As for the second weakening we need to make: We explicitly add to our tester a check that the value of maxj{Xj} arising after Phase 1 is not too large. Roughly speaking, this extra test ensures that there are no very heavy elements. (Of course, this is satisfied when 𝐩=𝐮d, so we don’t mind adding this test.) The reason we need to add this check is so that we can bound the quadratic expression j=1dXj2 (which enters into the value of σ2) by maxj{Xj}j=1dXj; in turn, once maxj{Xj} is checked to be small, this expression can be bounded by the linear quantity j=1dXj, which can be related to μ. It is by relating σ2 to μ in this way that we are able to show the correlation between rare events – that when σ2 is big, μ is also big.

To conclude, we apologize to the reader for writing a “technical overview” whose length is nearly comparable to that of the actual proof itself. While we tried to make our argument as streamlined and concise as possible, we felt that it was worth conveying the ideas and detours which led us there, and which, while now hidden, motivated the final proof.

3 Preliminaries

3.1 Probability distances

Throughout, log and ln the binary and natural logarithms, respectively. We identify a probability distribution 𝐩 over [d]={1,2,,d} with its probability mass function (pmf), or, equivalently, a vector 𝐩d such that 𝐩i0 for all i and i=1d𝐩i=1. For a subset S[d], we accordingly let 𝐩(S)=iS𝐩i. The total variation distance between two distributions 𝐩,𝐪 over [d] is defined as

dTV(𝐩,𝐪)=supS[d]{𝐩(S)𝐪(S)}=12𝐩𝐪1[0,1], (6)

where the last equality is from Scheffé’s lemma. By Cauchy–Schwarz, this gives us the relation

12𝐩𝐪2dTV(𝐩,𝐪)d2𝐩𝐪2. (7)

We will in this paper also consider other notions of distance between probability distributions: the squared Hellinger distance, defined as

dH2(𝐩,𝐪)=i=1d(𝐩i𝐪i)2=𝐩𝐪22[0,2]. (8)

(Some texts normalize this by a factor of 12; we do not do so, as it makes our statements cleaner.) The chi-squared divergence is then defined as

χ2(𝐩𝐪)=i=1d(𝐩i𝐪i)2𝐪i=(i=1d𝐩i2𝐪i)1, (9)

while the Kullback–Leibler divergence (least relevant to us, but quite common in the literature), in nats, is defined as

KL(𝐩𝐪)=i=1d𝐪iln𝐪i𝐩i. (10)

As mentioned in Equation 3, we have the following well known [14] chain of inequalities:

dTV(𝐩,𝐪)2dH2(𝐩,𝐪)KL(𝐩𝐪)χ2(𝐩𝐪). (11)

Moreover, for the special case of the uniform distribution 𝐮d over [d], we have

χ2(𝐩𝐮d)=d𝐩𝐮d22. (12)

3.2 Distribution access models

For a probability distribution 𝐩 on [d], we say a unitary U𝐩 is a synthesizer for 𝐩 if for some k

U𝐩|0k=i[d]pi|i|ψi, (13)

where the |ψi’s are normalized states often called “garbage states”. Note that any classical randomized circuit using S gates that samples from 𝐩 can be converted to a synthesizer U𝐩 in a purely black-box way with gate complexity O(S). (See [22] for details and a more thorough discussion of synthesizers.)

In this paper, we say an algorithm makes t uses of “the code for 𝐩” to mean that we use (as a black box) the unitaries U𝐩, U𝐩, and controlled-U𝐩 a total of t times in the algorithm. Each of these unitaries is easy to construct given an explicit gate decomposition of U𝐩 with the same gate complexity up to constant factors.

The quantum string oracle, which is used in many prior works, is a specific type of source code for 𝐩. Here we have standard quantum oracle access to an m-bit string x[d]m for some m. For any symbol i[d], the probability 𝐩i is defined as the frequency with which that symbol appears in x, i.e., 𝐩i=1m|{j:xj=i}|. Note that calling this oracle on the uniform superposition over m gives us a synthesizer for 𝐩. When a randomized sampler for 𝐩 is converted to a synthesizer, we get a quantum string oracle, but quantum string oracles are not as general as arbitrary synthesizers. For example, all probabilities described by a quantum string oracle will be integer multiples of 1m, whereas an arbitrary synthesizer has no such constraint.

3.3 Quantum Mean Estimation

When we use QME, we will have the source code for some distribution 𝐩 on [d], and we will also have explicitly constructed some (rational-valued) random variable Y:[d] (say, simply as a table). From this, one can easily generate code that outputs a sample from Y (i.e., outputs Y(𝒋) for 𝒋[d]), using the code for 𝐩 just one time. We will then use the following QME result from [22]:

Theorem 5.

There is a computationally efficient quantum algorithm with the following guarantee: Given the source code for a random variable Y, as well as parameters n and δ, the algorithm uses the code O(nlog(1/δ)) times and outputs an estimate 𝛍^ such that Pr[|μ^μ|>σ/n]δ, where μ=𝔼[Y] and σ=stddev[Y].

4 Algorithm in the Large Distance Regime

In this section, we establish Theorem 1, our main technical contribution. We do this by proving the strictly stronger Theorem 4, which we restate more formally:

Theorem 6.

For any constant B>0, there exists a computationally efficient quantum algorithm (Algorithm 1) with the following guarantees: on input 1dθ1, it makes O(d1/3/θ2/3) uses (where the hidden constant depends on B) of “the code” for an unknown probability distribution 𝐩 over [d], and satisfies

  1. 1.

    If χ2(𝐩𝐮d).99θ and 𝐩B/d, then the algorithm will accept with probability at least .99.

  2. 2.

    If dH2(𝐩,𝐮d)θ, then the algorithm will reject with probability at least .99.

Algorithm 1 for the large distance regime.
Proof.

Let us start by recording the following inequalities that we will frequently use:

n=cd1/3/θ2/3,θ1/dc/θncd. (14)

We begin with a simple lemma regarding the check on line 4:

Lemma 7.

If 𝐩B/d, then line 4 will reject with probability at most .001. Conversely, if 𝐩>2L/n, then line 4 will reject with probability at least .999.

Proof.

Let 𝑿jBin(n,pj) denote the number of times j is drawn. The second (“conversely”) part of of the proposition follows from a standard Chernoff bound. As for the first part, suppose 𝐩B/d. Now on one hand, if nd.99/B, so that L=100, we have

Pr[Bin(n,pj)100](n100)pj100((en/100)pj)100(e/(100d.01))100.001/d, (15)

and thus 𝑿j<100 for all j except with probability at most .001, as desired. Otherwise, L=Bclnd, and since 𝔼[𝑿j]Bn/dBc, the desired result follows from a standard Chernoff and union bound (provided c is large enough). From this, we conclude:

  • In Case (1), Line 4 rejects with probability at most .001.

  • In Case (2), we may assume 𝐩2L/n and 𝑿L, else Line 4 rejects with probability .999. Call this observation ().

Now to begin the QME analysis, write pj=1+εjd, where εj[1,d1], and let 𝝁=j=1dpj𝒀j, the mean of 𝒀 (from QME’s point of view). Writing ηdH2(p,𝐮d), our first goal will be to show:

In Case (1), 𝝁 .991θ except with probability at most .001; (16)
In Case (2), 𝝁 .997η except with probability at most .002. (17)

Starting with Equation 16, a short calculation (using j=1dεj=0) shows

𝝁=avgt=1n{ε𝑱t}𝔼[𝝁]=1dj=1dεj2=χ2(p𝐮d)𝔼[𝝁].99θ in Case (1). (18)

Also in Case (1) we get from Equation 18 that

Var[𝝁]=1nVar𝒋p[ε𝒋]1n𝔼𝒋p[ε𝒋2]Bndj=1nεj2=Bnχ2(p𝐮d).99BθnBθ2c, (19)

the last inequality using Equation 14. Combining the preceding two inequalities and using Chebyshev, we indeed conclude Equation 16 (provided c=c(B) is sufficiently large).

Towards Equation 17, let b2 be a certain universal constant to be chosen later, and say that j[d] is light if pjb/d (i.e., εjb1), heavy otherwise. We will write

𝝁1=avgt=1n{ε𝑱t:𝑱t heavy}0,𝝁2=avgt=1n{ε𝑱t:𝑱t light}(so 𝝁=𝝁1+𝝁2), (20)

and also observe

η=dH2(p,𝐮d)=1dj=1d(1+εj1)21dj=1dmin{|εj|,εj2}1dheavy jεj+1dlight jεj2η1+η2. (21)

Let us now make some estimates. First:

pheavyj heavypj=1dj heavy(1+εj)η1. (22)

Also, similar to our Case (1) estimates we have

𝔼[𝝁2]=1dlight j(εj2+εj)=η2η1(where we used j=1dεj=0), (23)

and

Var[𝝁2]
=1nVar𝒋p[𝟙𝒋 lightε𝒋]1n𝔼𝒋p[𝟙𝒋 lightε𝒋2]bndj lightεj2
=bnη2bcθη2bcη2η (in Case (2)). (24)

We will now establish Equation 17; in fact, we we even will show the following very slightly stronger fact:

In Case (2), 𝝁 .997(η1+η2).997η except with probability at most .002. (25)

We divide into two subcases:

Case (2a): 𝜼𝟏.001𝜼𝟐.

In this case we have η211.001(η1+η2), and 𝔼[𝝁2].999η2 from Equation 23. Since Section 4 implies Var[𝝁2]1.001bcη22, Chebyshev’s inequality tells us that 𝝁2.998η2 except with probability at most .001 (provided c is large enough). But then 𝝁𝝁2.9981.001(η1+η2), confirming Equation 25.

Case (2b): 𝜼𝟏>.001𝜼𝟐.

In this case we have η1.0011.001(η1+η2).0009(η1+η2). We now use that heavy j have εjb1 to observe that

𝝁1=avgt=1n{ε𝑱t:𝑱t heavy}(b1)(fraction of 𝑱t’s that are heavy)=(b1)Bin(n,pheavy)n (26)

(in distribution). We see that 𝔼[𝝁1](b1)pheavy, and moreover concentration of Binomials and Equation 22 imply that

𝝁112(b1)pheavy12(b1)η1 except with probability at most .001, (27)

provided that pheavyn is a sufficiently large constant. But we can indeed ensure this by taking c sufficient large: by Equation 22, being in Case (2b), and Equation 14, it holds that

pheavynη1n.0009(η1+η2)n.0009ηn.0009θn.0009c. (28)

At the same time, Equation 23 certainly implies 𝔼[𝝁2]η1, and Section 4 implies Var[𝝁2]bcη2(η1+η2)10001001bcη12 (using Case (2b)). Thus Chebyshev implies

𝝁21.1η1 except with probability at most .001, (29)

provided c is large enough. Combining Equations 27 and 29 yields

𝝁=𝝁1+𝝁2(b121.1)η1.0009(b121.1)(η1+η2) except with probability at most .002, (30)

which verifies Equation 25 provided b is a large enough constant.

We have now verified the properties of 𝝁 claimed in Equations 16 and 25. Next we analyze the random variable 𝝈2 that represents the variance of 𝒀 (from QME’s point of view). Our goal will be to show:

In Case (1), 𝝈2/(Cn)2 106θ2 except with probability at most .001, (31)
In Case (2), 𝝈2/(Cn)2 106𝝁2 except with probability at most .001. (32)

Together with Equations 16 and 25, these facts are sufficient to complete the proof of the theorem, by the QME guarantee of Theorem 5.

We have:

𝝈2j=1dpj𝒀j2𝝁2=(d/n)2j=1dpj𝑿j2(𝝁+1)2(d/n)2j=1dpj𝑿j2=𝝈S2+𝝈Sc2, (33)

where we’ve defined 𝝈S2(d/n)2jSpj𝑿j2 and Sc=[d]S. We will be making two different choices for S later, but we will always assume

S{j:j light},which implies jSεj0 (34)

(the implication because j=1dεj=0 and Sc contains only j’s with εjb10). Now since 𝔼[𝑿j2]=npj(1pj)+(npj)2npj+(npj)2, we have

𝔼[𝝈S2] (d2/n)jSpj2+d2jSpj3 (35)
d/n+(2/n)jSεj+(1/n)jSεj2+1/d+(3/d)jSεj+(3/d)jSεj2+(1/d)jSεj3 (36)
(5cd/n)(1+1djSεj+1djSεj2)+1djSεj3 (37)

(where the last inequality used 1/dc/n(c1)d/n from Equation 14). Using Equation 34 to drop the term of Equation 37 that’s linear in the εj’s, we thereby conclude

𝔼[𝝈S2/(Cn)2]𝔼[𝝈S2/n2] (5cd/n3)(1+1djSεj2)+(d1/2/n2)(1djSεj2)3/2 (38)
(5θ2/c2)(1+ηS)+θ4/3c2d1/6ηS3/2, (39)

where ηS1djSεj2. In Case (1) we select S=[d], so ηS=χ2(p𝐮d).99θθ1, and the above bound gives

Case (1)𝔼[𝝈2/(Cn)2]10θ2/c2+θ17/6c2d1/6109θ2 (40)

(provided c is large enough). Now Equation 31 follows by Markov’s inequality.

In Case (2) we select S={j:j light}, so ηS=η2 and we conclude (using obvious notation)

Case (2)𝔼[𝝈light2/(Cn)2](5θ2/c2)(1+η2)+θ4/3c2d1/6η23/2.4109(η1+η2)2, (41)

(provided c large enough), where we used θηη1+η2 and also θ1. We now complete the bounding of 𝝈2 in Case (2) by two different strategies:

Case (2.i): 𝒏>𝒅.99/𝑩.

In this case, L=Bclnd, and () tells us 𝐩2L/n, so we have

𝐩2Bclndn2B2clndd.99. (42)

Now returning to Equation 37, we get

𝔼[𝝈heavy2/(Cn)2] 5cdC2n3+5cdC2n3(1+εmax+n5cdεmax2)1dj heavyεj (43)
5θ2(Cc)2+5cd2C2n3(𝐩+n5c𝐩2)η15θ2(Cc)2+14B6c2ln2dC2d1.96η1, (44)

where we used Equation 42 and n>d.99/B. We can again bound the first expression in Equation 44 as 5θ2(Cc)2106(η1+η2)2. As for the second expression, either η1=0 (there are no heavy j’s) or else η1b1d (there is at least one heavy j). In either case, we have η1db1η12dη12, so we can bound this second expression by

14B6c2ln2dC2d.96η12.4109(η1+η2)2 (45)

where we used C=C(c) sufficiently large (and we could have taken C=1 were willing to assume d sufficiently large). Putting this bound together with Equation 41 we obtain:

Case (2.i)𝔼[𝝈/(Cn)2].8109(η1+η2)2.8.997109𝝁2109𝝁2, (46)

using Equation 25. Equation 32 now follows (in this Case (2.i)) by Markov’s inequality.

Case (2.ii): 𝒏𝒅.99/𝑩.

In this case we use a different strategy. Recall from Equation 33 that

𝝈2(d/n)2j=1dpj𝑿j2(d/n)2𝑿j=1dpj𝑿j=(d/n)𝑿(1+𝝁). (47)

By () we may assume 𝑿L=100, the equality because we are in Case (2.ii). Thus

𝝈2/(Cn)2𝝈2/n2100(d/n3)(1+𝝁)100θ2c3+100θ2c3𝝁106𝝁2. (48)

(provided c large enough), where we used θη1.997𝝁 (from Equation 25) and also θ1. This verifies Equation 32 in Case (2.ii), completing the proof.

References

  • [1] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics. Theory of Computing, 9(4):143–252, 2013. doi:10.4086/toc.2013.v009a004.
  • [2] Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. ACM, 51(4):595–605, July 2004. doi:10.1145/1008731.1008735.
  • [3] Jayadev Acharya, Clément L. Canonne, Yanjun Han, Ziteng Sun, and Himanshu Tyagi. Domain compression and its application to randomness-optimal distributed goodness-of-fit. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 3–40. PMLR, July 2020. URL: http://proceedings.mlr.press/v125/acharya20a.html.
  • [4] Jayadev Acharya, Constantinos Daskalakis, and Gautam C. Kamath. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems 28, pages 3577–3598. Curran Associates, Inc., 2015.
  • [5] Andris Ambainis. Quantum walk algorithm for element distinctness. SIAM Journal on Computing, 37(1):210–239, 2007. doi:10.1137/S0097539705447311.
  • [6] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Courtney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry Lyakh, Salvatore Mandrà, Jarrod R. McClean, Matthew McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov, John C. Platt, Chris Quintana, Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, Amit Vainsencher, Benjamin Villalonga, Theodore White, Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, and John M. Martinis. Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510, October 2019. doi:10.1038/s41586-019-1666-5.
  • [7] Aleksandrs Belovs. Quantum algorithms for classical probability distributions. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pages 50–59. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.1007/978-3-030-19955-5_5.
  • [8] Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J. Bremner, John M. Martinis, and Hartmut Neven. Characterizing quantum supremacy in near-term devices. Nature Physics, 14(6):595–600, April 2018. doi:10.1038/s41567-018-0124-x.
  • [9] Gilles Brassard, Peter Høyer, and Alain Tapp. Quantum counting. In Proceedings of the 25th Annual International Colloquium on Automata, Languages, and Programming (ICALP), pages 820–831. Springer–Verlag, 1998. doi:10.1007/bfb0055105.
  • [10] Sergey Bravyi, Aram Harrow, and Avinatan Hassidim. Quantum algorithms for testing properties of distributions. Transactions on Information Theory, 57(6):3971–3981, 2011. doi:10.1109/TIT.2011.2134250.
  • [11] Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022. doi:10.1561/0100000114.
  • [12] Sourav Chakraborty, Eldar Fischer, Arie Matsliah, and Ronald de Wolf. New Results on Quantum Property Testing. In 30th International Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2010), volume 8 of Leibniz International Proceedings in Informatics (LIPIcs), pages 145–156, Dagstuhl, Germany, 2010. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.FSTTCS.2010.145.
  • [13] Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016. IEEE Computer Society, 2016.
  • [14] Alison Gibbs and Francis Su. On choosing and bounding probability metrics. International Statistical Rreview, 70(3):419–435, 2002.
  • [15] András Gilyén and Tongyang Li. Distributional Property Testing in a Quantum World. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151 of Leibniz International Proceedings in Informatics (LIPIcs), pages 25:1–25:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2020.25.
  • [16] Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. Electronic Colloquium on Computational Complexity (ECCC), 23:15, 2016. URL: http://eccc.hpi-web.de/report/2016/015.
  • [17] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000.
  • [18] Lov Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing (STOC), pages 212–219. ACM, New York, 1996. doi:10.1145/237814.237866.
  • [19] Lov Grover. A framework for fast quantum mechanical algorithms. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (STOC), pages 53–62. ACM, New York, 1998. doi:10.1145/276698.276712.
  • [20] Yassine Hamoudi. Quantum Algorithms for the Monte Carlo Method. PhD thesis, Université de Paris, 2021.
  • [21] Stefan Heinrich. Quantum summation with an application to integration. Journal of Complexity, 18(1):1–50, 2002. doi:10.1006/jcom.2001.0629.
  • [22] Robin Kothari and Ryan O’Donnell. Mean estimation when you have the source code; or, quantum Monte Carlo methods, pages 1186–1215. Society for Industrial and Applied Mathematics, January 2023. doi:10.1137/1.9781611977554.ch44.
  • [23] Samuel Kutin. Quantum lower bound for the collision problem with small range. Theory of Computing, 1(2):29–36, 2005. doi:10.4086/toc.2005.v001a002.
  • [24] Jingquan Luo, Qisheng Wang, and Lvzhou Li. Succinct quantum testers for closeness and k-wise uniformity of probability distributions. IEEE Trans. Inf. Theory, 70(7):5092–5103, 2024.
  • [25] Ashley Montanaro. Quantum speedup of Monte Carlo methods. Proceedings of the Royal Society A, 471(2181):20150301, 20, 2015. doi:10.1098/rspa.2015.0301.
  • [26] Ashwin Nayak and Felix Wu. The quantum query complexity of approximating the median and related statistics. In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, STOC ’99, pages 384–393, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/301250.301349.
  • [27] Ryan O’Donnell and John Wright. Learning and testing quantum states via probabilistic combinatorics and representation theory. In Current developments in mathematics 2021, pages 43–94. International Press, Somerville, MA, 2023.
  • [28] Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008. doi:10.1109/TIT.2008.928987.

Appendix A Algorithm in the Small Distance Regime

In this appendix, we provide an alternative (and arguably simpler) proof of the main result of [24]:

Theorem 8.

There is a computationally efficient quantum algorithm (Algorithm 2) for uniformity testing with the following guarantees: it takes O(d1/2/ε) “samples” from an unknown probability distribution 𝐩 over [d], and distinguishes with probability at least 2/3 between (1) χ2(𝐩𝐮d)ε2144, and (2) χ2(𝐩𝐮d)>ε2.

This in turn will follow from the more general result on tolerant 2 closeness testing, where one is given access to the source code for two unknown probability distributions 𝐩,𝐪 over [d], and one seeks to distinguish 𝐩𝐪2cτ from 𝐩𝐪2τ.

Theorem 9.

There is a computationally efficient quantum algorithm (Algorithm 2) for closeness testing with the following guarantees: it takes O(1/τ) “samples” from two unknown probability distributions 𝐩,𝐪 over [d], and distinguishes with probability at least 2/3 between (1) 𝐩𝐪2τ12, and (2) 𝐩𝐪2>τ.

Theorem 8 can then be obtained as a direct corollary by setting τ=ε/d, recalling that when 𝐪 is the uniform distribution 𝐮d, 2 distance and χ2 divergence are equivalent:

𝐩𝐮d22=i=1d(𝐩i1/d)2=1di=1d(𝐩i1/d)21/d=1dχ2(𝐩𝐮d)

We emphasize that the result of Theorem 9 itself is not new, as a quantum algorithm achieving the same sample complexity (in the same access model) was recently obtained by [24].555Technically, [24]’s result can be seen as slightly stronger, in that it allows to test 𝐩𝐪2(1γ)τ vs. 𝐩𝐪2𝐮d>τ, for arbitrarily small constant γ>0. However, our algorithm differs significantly from the one in [24], and we believe it to be of independent interest for several reasons:

  • it is conceptually very simple: (classically) hash the domain down to two elements, and use QME to estimate the bias of the resulting Bernoulli;

  • it neatly separates the quantum and classical aspects of the task, only using QME (as a blackbox) in a single step of the algorithm;

  • in contrast to the algorithm of [24], it decouples the use of the source code from 𝐩 and 𝐪, allowing one to run our algorithm when the accesses to the two distributions are on different machines, locations, or even will be granted at different points in time (i.e., one can run part of the algorithm using the source code for 𝐩, and, one continent and a year apart, run the remaining part on the now-available source code for 𝐪 without needing 𝐩 anymore).

The idea behind Theorem 9 is relatively simple: previous work (in the classical setting) showed that hashing the domain from d to a much smaller dd could yield sample-optimal testing algorithms in some settings, e.g., when testing under privacy bandwidth, or memory constraints. Indeed, while this “domain compression” reduces the total variation distance by a factor Θ(d/d), this shrinkage is, in these settings, balanced by the reduction in domain size. The key insight in our algorithm is then to (1) use this hashing with respect to 2 distance, not total variation distance, and show that one can in this case get a two-sided guarantee in the distance (low-distortion embedding) instead of a one-sided one; and (2) compress the domain all the way to d=2, so that one can then invoke the QME algorithm to simply estimate the bias of a coin to an additive ±τ, a task for which a quantum quadratic speedup is well known.

Proof of Theorem 9.

As mentioned above, a key building block of our algorithm is the following “binary hashing lemma,” a simple case of the domain compression primitive of [3]:

Lemma 10 (Random Binary Hashing (Lemma 2.9 and Remark 2.4 of [11]).

Let 𝐩,𝐪Δ(d). Then, for every α[0,1/2],

PrS[|𝐩(S)𝐪(S)|α𝐩𝐪2]112(14α2)2,

where S[d] is a uniformly random subset of [d].

Given our goal of tolerant testing, we also require a converse to Lemma 10, stated and proven below:

Lemma 11.

Let 𝐩,𝐪Δ(d). Then, for every β[1/2,),

PrS[|𝐩(S)𝐪(S)|β𝐩𝐪2]14β2,

where S[d] is a uniformly random subset of [d].

Proof.

As in the proof of Lemma 10, we write δ𝐩𝐪d and 𝐩(S)𝐪(S)=12Z, where Zi=1dδiξi for ξ1,,ξd i.i.d. Rademacher. We will use the following fact established in the proof of this lemma, which we reproduce for completeness:

𝔼[Z2]=1i,jdδiδj𝔼[ξiξj]=i=1dδi2=δ22. (49)

By Markov’s inequality, we then have

PrS[|𝐩(S)𝐪(S)|>β𝐩𝐪2]=PrS[Z2>4β2𝔼[Z2]]14β2

concluding the proof. While the above two lemmas allow us to obtain a slightly more general result than in the theorem statement by keeping α,β as free parameters, for concreteness, set α1/(22) and β=4. This implies the following:

  • If 𝐩𝐪2τ, then

    PrS[|𝐩(S)|S|d|τ8]148
  • If 𝐩𝐪2τ12, then

    PrS[|𝐩(S)|S|d|τ9]164.

where S[d] is a uniformly random subset of [d]. This allows us to distinguish between the two cases with only O(1) repetitions:

Algorithm 2 QME+Binary Hashing Tester.

A standard analysis shows that, for T a sufficiently large constant, with probability at least 2/3 the estimate 1Tt=1Tbt will be within an additive δ+11000 of the corresponding value (either 1/48 or 1/64), in which case the output is correct. The total number of samples required is T times the sample of the Quantum Mean Estimation call on Line 4, which is O(1/τ): the complexity of getting a O(τ)-additive estimate of the mean of a Bernoulli random variable with high (constant) probability. This concludes the proof.

Appendix B Algorithm in the Giant Distance Regime

In this appendix, we show that, in the (stronger) quantum string oracle model, one can perform tolerant uniformity testing with respect to χ2 divergence in the “very large parameter regime,” that is, to distinguish χ2(𝐩𝐮d)cθ from χ2(𝐩𝐮d)>θ for θ1:

Theorem 12.

There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: For θ1, the algorithm makes O(d1/3/θ1/3) calls to the quantum string oracle for an unknown distribution 𝐩 over [d], and distinguishes with probability at least .99 between

(1)χ2(𝐩𝐮d)cθ,and(2)χ2(𝐩𝐮d)>θ, (50)

where c>0 is an absolute constant. Moreover, this query complexity is optimal.

Note that, as discussed in the introduction, this result does not imply anything in terms of total variation distance, as the latter is always at most 1; however, we believe this result to be of interest for at least two reasons: (1) it is in itself a reasonable (and often useful) testing question, when total variation distance is not the most relevant distance measure, and implies, for instance, testing χ2(𝐩𝐮d)cθ from KL(𝐩𝐮d)>θ; and (2) one can show that this complexity is tight, by a reduction to the θ-to-1 collision problem, which provides additional evidence for Conjecture 2.

Proof.

The main ingredient of the proof is the following lemma, which guarantees that taking N=Θ(d/θ) from the unknown distribution 𝐩 is enough to obtain (with high constant probability) a multiset of elements with, in one case, no collisions, and in the other at least one collision:

Lemma 13.

For θ1, there exists a constant c(0,1) such that taking N i.i.d. samples from an unknown 𝐩 over [d] results in a multiset S satisfying the following with probability at least .99:

  • If χ2(𝐩𝐮d)cθ, then all elements in S are distinct;

  • If χ2(𝐩𝐮d)θ, then at least two elements in S are identical;

as long as 1601dθN110cdθ. (In particular, taking c116010 suffices to ensure such a choice of N is possible.)

Before proving this lemma, we describe how it implies our stated complexity upper bound. Lemma 13 guarantees that we can reduce our testing problem to that of deciding if, given oracle access to a string of size N=Θ(d/θ), whether all the elements in it are distinct. This problem is solved by Ambainis’ element distinctness quantum-walk algorithm [5] using O(N2/3)=O(d1/3/θ1/3) quantum queries.

Proof of Lemma 13.

Suppose we take N i.i.d. samples X1,,XN from 𝐩, and count the number Z of collisions among them:

Z1i<jN𝟙{Xi=Xj}

Letting δ𝐩𝐮d and powt(x)i=1dxit for all integer t0 and vector xd (so that δi=𝐩i1/d for all i), we have, pow1(δ)=0, and

pow2(δ)=𝐩𝐮d22=1dχ2(𝐩𝐮d)

Now, it is not hard to verify that 𝔼[Z]=(N2)𝐩22=(N2)(pow2(δ)+1/d), and

Var[Z] =(N2)𝐩22(1𝐩22)+6(N3)(𝐩33𝐩24)
𝔼[Z]+6(N3)(pow3(δ)+3dpow2(δ)) (51)

From this, we get, setting τθ/d1/d:

  • If χ2(𝐩𝐮d)cθ, then pow2(δ)c2τ2, and as long as N110cτ we have (N2)(c2τ2+1/d)1/100, so that by Markov’s inequality

    Pr[Z1]Pr[Z100𝔼[Z]]1100
  • If χ2(𝐩𝐮d)θ, then pow2(δ)τ2, and by Chebyshev’s inequality and Equation 51

    Pr[Z=0] Pr[|Z𝔼[Z]|𝔼[Z]]1𝔼[Z]+4Npow3(δ)+3dpow2(δ)(pow2(δ)+1/d)2
    2N(N1)τ2+4Npow2(δ)3/2+3dpow2(δ)pow2(δ)2
    3N2τ2+4Nτ+12Ndτ2
    3N2τ2+4Nτ+12N (τ1/d)
    3N2τ2+16Nτ

    which is at most 1100 for N1601τ.

This proves the lemma. This concludes the proof of the upper bound part of Theorem 12. To conclude, it only remains to show that this is, indeed, optimal. For this, we need a lower bound of [23], which generalized a lower bound of Aaronson and Shi [2]:

Theorem 14 ([23]).

Let d>0 and r2 be integers such that r|d, and let f:[d][d] be a function to which we have quantum oracle access. Then deciding if f is 1-to-1 or r-to-1, promised that one of these holds, requires Ω((d/r)1/3) quantum queries.

When we view this function as a quantum string oracle for a probability distribution, the function being 1-to-1 corresponds to the uniform distribution on [d]. In the other case, the distribution is uniform on a subset of size [d/r], for any rθ+1 dividing d. An easy calculation shows that the second distribution is at χ2 divergence

χ2(𝐩𝐮d)=i[d](𝐩i21/d)1=dr2d2dr1=r1θ, (52)

from uniform, which completes the proof.

Appendix C Reduction from Identity to Uniformity Testing

As mentioned in the introduction, there is a known reduction from identity to uniformity testing, due to Goldreich [16] and inspired by [13]: which, in a blackbox way, converts an instance of uniformity testing (in total variation distance) with reference distribution 𝐪 over [d] and distance parameter ε to an instance of uniformity testing over [4d] and distance parameter ε/4. (Here, we follow the exposition and parameter setting of [11, Section 2.2.3].)

To be able to use it in our setting, all we need to check is that this blackbox reduction Φ𝐪 preserves access to “the code”: that is, given the code C𝐩 for a probability distribution 𝐩 over [d], that we can efficiently have access to the code C𝐩 for the resulting distribution 𝐩=Φ𝐪(𝐩) over [4d]. To do so, note that Φ𝐪 is the composition of 3 successive mappings,

Φ𝐪=Φ𝐪(1)Φ𝐪(2)Φ𝐪(3)

where Φ𝐪(3):[d][d], Φ𝐪(2):[d][d+1], and , Φ𝐪(2):[d+1][4d]. So it suffices to show that each of these 3 mappings does preserve access to the code generating a sample from the resulting distribution.

  • The first, Φ𝐪(3), is the easier, as it consists only in mixing its input with the uniform distribution:

    Φ𝐪(3)(𝐩)=12𝐩+12𝐮d

    for which a circuit can be easily obtained, given a circuit for 𝐩.

  • The second, Φ𝐪(2), “rounds down” the probability of each of the d elements of the domain, and sends the remaining probability mass to a (d+1)-th new element:

    Φ𝐪(2)(𝐩)i={4d𝐪i4d𝐪i𝐩i,i[d]1i=1d4d𝐪i4d𝐪i𝐩i,i=d+1

    This corresponds to adding to the circuit C𝐩 for 𝐩 a “postprocessing circuit” which, if the output of C𝐩 is i, outputs i with probability 4d𝐪i4d𝐪i (and d+1 otherwise).

  • The third, Φ𝐪(1), assumes that the reference distribution 𝐪 is “grained” (namely, all its probabilities are positive multiples of 1/(4d)), which will be the case after the first two mappings666Specifically, when chaining the three mappings, the reference distribution called 𝐪 here is actually Φ𝐪(2)Φ𝐪(3)(𝐪). fully known). Having partitioned [4d] in sets S1,,Sd where

    |Si|=4d𝐪i1

    and Φ𝐪(1) is given by

    Φ𝐪(3)(𝐩)i=j=1d𝐩i|Si|𝟙{jSi},i[4d].

    This corresponds to adding to the circuit C𝐩 for 𝐩 a “postprocessing circuit” which, if the output of C𝐩 is i, outputs an element of Si uniformly at random. (Importantly, S1,,Sd are uniquely determined by 𝐪, and do not depend on 𝐩 or C𝐩 at all.)

To summarize, each of these three mappings can be implemented to provide, given a circuit C𝐩 for 𝐩, a circuit C𝐩 for the output 𝐩, so that altogether the reduction can be implemented in a way which preserves access to “the code.”