Uniformity Testing When You Have the Source Code
Abstract
We study quantum algorithms for verifying properties of the output probability distribution of a classical or quantum circuit, given access to the source code that generates the distribution. We consider the basic task of uniformity testing, which is to decide if the output distribution is uniform on or -far from uniform in total variation distance. More generally, we consider identity testing, which is the task of deciding if the output distribution equals a known hypothesis distribution, or is -far from it. For both problems, the previous best known upper bound was . Here we improve the upper bound to , which we conjecture is optimal.
Keywords and phrases:
distribution testing, uniformity testing, quantum algorithmsFunding:
Clément L. Canonne: Supported by an ARC DECRA (DE230101329).Copyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms ; Theory of computation Quantum computation theoryEvent:
20th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2025)Editor:
Bill FeffermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
For 30 years we have known that quantum computers can solve certain problems significantly faster than any known classical algorithm. Traditionally, most of the research in this area has focused on decision problems (like SAT) or function problems (like Factoring), where for each possible input there is a unique “correct” output. However, we have also found that quantum computers can yield speedups for the task of sampling from certain probability distributions. Prominent examples include boson sampling [1] and random circuit sampling [8]. Sampling tasks have seemed more natural for NISQ-era quantum computation, and indeed many of the first candidate experimental demonstrations of quantum advantage have been for sampling problems [6].
One of the downsides of sampling problems is the challenge of verifying the output of an algorithm, whether classical or quantum, that claims to sample from a certain distribution. As a simple example, consider a classical or quantum algorithm that implements a supposed hash function with output alphabet . The algorithm designer claims that the output distribution of this hash function is uniform on . If denotes the actual output distribution of the algorithm, and denotes the uniform distribution on , then we would like to test whether , and reject the claim if is in fact -far from in total variation distance, meaning . (We will also consider other distance measures in this work, since the complexity of the testing task is sensitive to this choice.)
This verification task is called “uniformity testing” (in total variation distance) and its complexity is well studied in the classical literature. If we only have access to samples from , but are not allowed to inspect the algorithm that produces these samples, it is known that samples are necessary and sufficient to solve this problem; there are various classical algorithms that achieve this bound (starting with that of [28]; see, e.g., [11] for a detailed survey and discussion), and it is also not possible to do better with a quantum algorithm. But what if – as in the examples above – we do have access to the algorithm that produces ? Can we improve on this complexity if we have access to the “source code” of the algorithm?
Having the source code
To clarify, the “source code” for a classical randomized sampling algorithm means a randomized circuit (with no input) whose output is one draw from . More generally, the “source code” for a quantum sampling algorithm means a unitary quantum circuit (with all input qubits fixed to ) which gives one draw from when some of its output bits are measured in the standard basis and the rest are discarded.111This is sometimes termed the “purified quantum query access model”, and is the most natural and general model. The “quantum string oracle”, referenced later in Table 1, refers to a situation in which one assumes a very specific type of source code for (thus making algorithmic tasks easier). See Section 3 for details and [7] for a thorough discussion. The simplest way to use the code for is to run it, obtaining one sample. If has size , then getting one sample this way has cost . Another way to use the code is to deterministically compute all its output probabilities; this gives one perfect information about , but has cost bound . But quantum computing has suggested a third way to use the code: “running it in reverse”. For example, Grover’s original algorithm [18] can be seen as distinguishing two possibilities for on , namely or , while using only forwards/backwards executions of . The total cost here is , the same as the cost for samples.
We suggest that the utility of “having the source code” for distribution testing problems remains notably underexplored. Indeed, there is significant room for improvment in the bounds for even the most canonical of all such problems: uniformity testing. Our main theorem is the following:
Theorem 1.
There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: given , the algorithm makes uses of “the code” for an unknown distribution over , and distinguishes with probability at least between
| (1) |
The main idea behind this theorem is to combine very careful classical probabilistic analysis with a black-box use of Quantum Mean Estimation (QME) [19, 9, 21, 25, 20, 22]; see Section 2 for further discussion. Table 1 below compares our result to prior work on the problem.
Table 1 has two columns because it seems that different algorithms are necessary depending on how and relate. (Interestingly, this is not the case in the classical no-source-code model.) Thus combining our new result with that of [24], the best known upper bound becomes . We remark that although [24]’s algorithm/analysis is already simple, we give an alternative simple algorithm and analysis achieving in Appendix A, employing the classical analysis + QME approach used in the proof of our main theorem.
Lower bounds?
As for lower bounds (holding even in the quantum string oracle model): complexity is necessary even in the case of constant , following from work of [26]; and, [12] showed a lower bound of even in the case of constant , by reduction from the collision problem [2]. For reasons discussed in Section 2, we make the (somewhat bold) conjecture that our new upper bound is in fact tight for all and :
Conjecture 2.
Any algorithm that distinguishes from with success probability at least requires uses of the code for . (Moreover, we conjecture this lower bound in the stronger quantum string oracle model.)
Identity testing
Several prior works in this area have also studied the following natural generalization of uniformity testing: testing identity of the unknown distribution to a known hypothesis distribution . An example application of this might be when is a Porter–Thomas-type distribution arising as the ideal output of a random quantum circuit. Luckily, fairly recent work has given a completely generic reduction from any fixed identity testing problem to the uniformity testing problem; see [16], or [11, Section 2.2.3]. We can therefore immediately extend our new theorem to the general identity-testing setting:
Corollary 3.
There is a computationally efficient quantum algorithm for identity testing to a reference distribution over with the following guarantees: The algorithm makes uses of “the code” for an unknown distribution over , and distinguishes with probability at least between
| (2) |
(For completeness, we verify in Appendix C that the blackbox reduction does indeed carry through in our setting, preserving access to “the code”.)
More fine-grained results
In proving our main theorem, we will in fact prove a strictly stronger version, one which is more fine-grained in two ways:
-
(1)
Tolerance: Not only does our test accept with high probability when , it also accepts with high probability when is sufficiently close to .
-
(2)
Stricter distance measure. Not only does our test reject with high probability when , it also rejects with high probability when . (This is stronger, since always.)
To elaborate, recall the below chain of inequalities, which also includes KL- and -divergence. (We review probability distance measures in Section 3.)
| (3) |
The strictly stronger version of Theorem 1 that we prove is:
Theorem 4.
There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: For , the algorithm makes uses of “the code” for an unknown distribution over , and distinguishes with probability at least between
| (4) |
Additional results
Speaking of -divergence, we mention two additional results we prove at the end of our work. These results additionally inform our Conjecture 2.
First, as mentioned earlier, in Appendix A we give an alternative proof of the upper bound of [24], and – like in that work – our result is tolerant with respect to -divergence. That is, we prove the strictly stronger result that for , one can use the code times to distinguish from (for some constant ).
Second, recall that can be as large as . For example, for any set of size . Thus it makes sense to consider the uniformity testing problem even with respect to a -divergence threshold that exceeds . In Appendix B we show (albeit only in the quantum string oracle model) that for , one can use the code times to distinguish from , and this is optimal.
2 Technical overview of our proof
Our main algorithm is concerned with achieving the best possible -dependence for uniformity testing while maintaining a -dependence of ; in this way, it is best compared with the older works of [10, 12], the latter of which achieves complexity , as well as the classical (no-source-code) algorithm achieving complexity .
In fact, all four algorithms here are almost the same (except in terms of the number of samples they use).
Let us describe our viewpoint on this common methodology.
We consider the algorithm as being divided into two Phases, and we may as well assume each Phase uses samples.
Phase 1 will have two properties:
-
It will make black-box draws from (i.e., the source code is not used in Phase 1).
-
Using these draws, Phase 1 will end by constructing a certain “random variable” – in the technical sense of a function .
-
The mean of this random variable , vis-a-vis the unknown distribution , will ideally be close to . That is, ideally .
Phase 2 then performs a mean estimation algorithm on (vis-a-vis ) to get an estimate of and therefore of . Ideally, the resulting overall algorithm is not just a uniformity tester, but a -divergence-from-uniformity estimator. This could then be weakened to a TV-distance uniformity tester using the inequality .
The mean estimation algorithm used in Phase 2 differs depending on whether one has the source code or not. In the classical (no source code) model, one simply uses the naive mean estimation algorithm based on more black-box samples; by Chebyshev’s inequality, this will (with high probability) give an estimate of to within , where . In the case of a quantum tester with the source code access, we can use a Quantum Mean Estimation (QME) routine; in particular, the one from [22] will (with high probability) yield an estimate of to within .222This QME routine was not available at the time of [10, 12] which had to make do with Quantum Approximate Counting [9] – essentially, QME for Bernoulli random variables. But this is not the source of our improvement; one can obtain our main theorem with only a -factor loss using just Quantum Approximate Counting.
A subtle aspect of this overall plan is that the mean and standard deviation of are themselves random variables (in the usual sense), where the randomness comes from Phase 1. Thus it is natural to analyze and . Of course, these depend on the definition of , which we now reveal: , where denotes the number of times was drawn in Phase 1. The point of this definition of is that a short calculation implies
| (5) |
that is, the random variable is an unbiased estimator for our quantity of interest, the -divergence of from . This is excellent, because although the algorithm does not see at the end of Phase 1, it will likely get a good estimate of it at the end of Phase 2…so long as (the random variable) is small.
We therefore finally have two sources of uncertainty about our final error (in estimating ):
-
1.
Although , the random variable may have fluctuated around its expectation at the end of Phase 1. One way to control this would be to bound (and then use Chebyshev).
-
2.
The Phase 2 mean estimation incurs an error proportional to . One way to control this would be to bound (and then use Markov to get a high-probability bound on , and hence ).
The quantities controlling the error here, and , are explicitly calculable symmetric polynomials in of degree at most , depending on . In principle, then, one can relate these quantities to itself, and derive a bound on how big must be to (with high probability) get a good estimate of .
In the classical (no source code) case, this methodology is a way to obtain the sample complexity, adding to the number of existing classical sample-optimal algorithms for the task. (This method in particular has some potential useful applications; e.g., one could consider decoupling the number of samples used in Phases 1 and 2 to, e.g., obtain tradeoffs for memory-limited settings). On one hand, with this method one can give a very compressed proof of the that, factoring out routine calculations, fits in half a page (see, e.g., [27, Sec. 10]). On the other hand, one has to execute the calculations and estimations with great care, lest one would obtain a suboptimal result (there is a reason it took years333Technically, it took more than years, as the proof of [28] was later shown to have a flaw: so the tight dependence had to wait until [4]. See [11, Section 2.3] for a discussion. to get the optimal quadratic dependence on [17, 28]).
In the case when source code is available, so that one can use the QME algorithm, how well does this methodology fare? On one hand, QME gives a quadratic improvement over naive classical mean estimation, meaning one can try to use signficantly fewer samples in Phase 2. But when one balances out the sample complexity between the two Phases, it implies one is using fewer samples in Phase 1, and hence one gets worse concentration of around its mean in Phase 1. So the calcuations become more delicate.
2.1 Heuristic calculations
Instead of diving into complex calculations, let’s look at some heuristics. First, let’s consider how the algorithm proceeds in the case when really is the uniform distribution . In this case, as long as we’re in a scenario where , we will likely get all distinct elements in Phase 1, meaning that will be for exactly values of and will be otherwise. Then will be for values of and will be otherwise. This indeed means with certainty in Phase 1. This is very good; we get no error out of Phase 1. However QME in Phase 2 will not perfectly return the value ; rather, it will return something in the range , where . Thus the value returned by QME may well be around , which from the algorithm’s point of view is consistent with . Thus the algorithm will only become confident that , and hence it can only confidently accept in the case provided ; i.e., . We thereby see that with this algorithm, a uniformity testing upper bound of is the best we can hope for. If one also believes that this algorithm might be optimal (and it has been the method of choice for essentially all previously known results), then this could possibly be taken as evidence for our Conjecture 2.
At this point, one might try to prove that complexity is achievable; so far we have only argued that with this many samples, the algorithm will correctly accept when (with high probability). Again, before jumping into calculations, one might try to guess the “hardest” kind of -far distributions one might face, and try to work out the calculations for these cases. The hardest distributions in the classical case (i.e., the ones that lead to the matching lower bound) are very natural: they are the ’s in which half of the elements have and half have . Assuming this is the “worst case”, one can calculate what and will be, and the calculations turn out just as desired. That is, with , these two error quantities can be shown to be suffciently small so that the overall algorthm will correctly become confident that significantly exceeds , and hence the algorithm can correctly reject.
Everything therefore looks good, but there is a fly in the ointment. Even though this particular with its values of seems like the “hardest” distribution to face, one still has to reason about all possible ’s with . And when one does the calculations of and as prescribed by the standard methodology, the plan ends up failing. Specifically one gets too much error for ’s that have somewhat “heavy” elements, meaning ’s with . The prior works [10, 12] cope with this failure by taking more samples; i.e., setting for (specifically, [12] achieves ). But our goal is to show that this is unnecessary – that the algorithm itself works, even though the standard and natural way of analyzing it fails.
In short, the reason the standard analysis of the algorithm fails is due to “rare events” that are caused by heavy elements in . These ’s with may well still have (for our desired ), and thus be drawn only rarely in Phase 1. The major difficulty is that when they are drawn, they generate a very large contribution to , causing to be “misleadingly large”. That is, when there are heavy elements, may have the property of typically being much smaller than its expectation. Thus controlling the QME error using the expected value of is a bad strategy.
Perhaps the key insight in our analysis is to show: In those rare Phase 1 outcomes when is unusually large, is also unusually large compared to its expectation. The latter event is helpful, because if ends up much bigger than its expectation, we can tolerate a correspondingly worse error-bar from QME. In short, we show that the rare bad outcomes for coincide with the rare good outcomes for .
In order to make this idea work out quantitatively, we (seem to) need to weaken our ambitions and get something a bit worse than a -divergence-from-uniform estimation algorithm, in two ways. (This is fine, as our main goal is just a non-tolerant uniformity tester with respect to TV.) First, rather than insisting that we accept with high probability when and reject with high probability when , we need to only require rejection when . The reason is that the rare large values of that we face are only comparable with the larger value , and not with .444We remark that this -versus-Hellinger-squared dichotomy is quite reminsicent of the one that occurs in classical works on identity testing, such as [4].
As for the second weakening we need to make: We explicitly add to our tester a check that the value of arising after Phase 1 is not too large. Roughly speaking, this extra test ensures that there are no very heavy elements. (Of course, this is satisfied when , so we don’t mind adding this test.) The reason we need to add this check is so that we can bound the quadratic expression (which enters into the value of ) by ; in turn, once is checked to be small, this expression can be bounded by the linear quantity , which can be related to . It is by relating to in this way that we are able to show the correlation between rare events – that when is big, is also big.
To conclude, we apologize to the reader for writing a “technical overview” whose length is nearly comparable to that of the actual proof itself. While we tried to make our argument as streamlined and concise as possible, we felt that it was worth conveying the ideas and detours which led us there, and which, while now hidden, motivated the final proof.
3 Preliminaries
3.1 Probability distances
Throughout, and the binary and natural logarithms, respectively. We identify a probability distribution over with its probability mass function (pmf), or, equivalently, a vector such that for all and . For a subset , we accordingly let . The total variation distance between two distributions over is defined as
| (6) |
where the last equality is from Scheffé’s lemma. By Cauchy–Schwarz, this gives us the relation
| (7) |
We will in this paper also consider other notions of distance between probability distributions: the squared Hellinger distance, defined as
| (8) |
(Some texts normalize this by a factor of ; we do not do so, as it makes our statements cleaner.) The chi-squared divergence is then defined as
| (9) |
while the Kullback–Leibler divergence (least relevant to us, but quite common in the literature), in nats, is defined as
| (10) |
As mentioned in Equation 3, we have the following well known [14] chain of inequalities:
| (11) |
Moreover, for the special case of the uniform distribution over , we have
| (12) |
3.2 Distribution access models
For a probability distribution on , we say a unitary is a synthesizer for if for some
| (13) |
where the ’s are normalized states often called “garbage states”. Note that any classical randomized circuit using gates that samples from can be converted to a synthesizer in a purely black-box way with gate complexity . (See [22] for details and a more thorough discussion of synthesizers.)
In this paper, we say an algorithm makes uses of “the code for ” to mean that we use (as a black box) the unitaries , , and controlled- a total of times in the algorithm. Each of these unitaries is easy to construct given an explicit gate decomposition of with the same gate complexity up to constant factors.
The quantum string oracle, which is used in many prior works, is a specific type of source code for . Here we have standard quantum oracle access to an -bit string for some . For any symbol , the probability is defined as the frequency with which that symbol appears in , i.e., . Note that calling this oracle on the uniform superposition over gives us a synthesizer for . When a randomized sampler for is converted to a synthesizer, we get a quantum string oracle, but quantum string oracles are not as general as arbitrary synthesizers. For example, all probabilities described by a quantum string oracle will be integer multiples of , whereas an arbitrary synthesizer has no such constraint.
3.3 Quantum Mean Estimation
When we use QME, we will have the source code for some distribution on , and we will also have explicitly constructed some (rational-valued) random variable (say, simply as a table). From this, one can easily generate code that outputs a sample from (i.e., outputs for ), using the code for just one time. We will then use the following QME result from [22]:
Theorem 5.
There is a computationally efficient quantum algorithm with the following guarantee: Given the source code for a random variable , as well as parameters and , the algorithm uses the code times and outputs an estimate such that , where and .
4 Algorithm in the Large Distance Regime
In this section, we establish Theorem 1, our main technical contribution. We do this by proving the strictly stronger Theorem 4, which we restate more formally:
Theorem 6.
For any constant , there exists a computationally efficient quantum algorithm (Algorithm 1) with the following guarantees: on input , it makes uses (where the hidden constant depends on ) of “the code” for an unknown probability distribution over , and satisfies
-
1.
If and , then the algorithm will accept with probability at least .
-
2.
If , then the algorithm will reject with probability at least .
Proof.
Let us start by recording the following inequalities that we will frequently use:
| (14) |
We begin with a simple lemma regarding the check on line 4:
Proof.
Let denote the number of times is drawn. The second (“conversely”) part of of the proposition follows from a standard Chernoff bound. As for the first part, suppose . Now on one hand, if , so that , we have
| (15) |
and thus for all except with probability at most , as desired. Otherwise, , and since the desired result follows from a standard Chernoff and union bound (provided is large enough). From this, we conclude:
Now to begin the QME analysis, write , where , and let , the mean of (from QME’s point of view). Writing , our first goal will be to show:
| except with probability at most ; | (16) | ||||
| except with probability at most . | (17) |
Starting with Equation 16, a short calculation (using ) shows
| (18) |
Also in Case (1) we get from Equation 18 that
| (19) |
the last inequality using Equation 14. Combining the preceding two inequalities and using Chebyshev, we indeed conclude Equation 16 (provided is sufficiently large).
Towards Equation 17, let be a certain universal constant to be chosen later, and say that is light if (i.e., ), heavy otherwise. We will write
| (20) |
and also observe
| (21) |
Let us now make some estimates. First:
| (22) |
Also, similar to our Case (1) estimates we have
| (23) |
and
| (24) |
We will now establish Equation 17; in fact, we we even will show the following very slightly stronger fact:
| except with probability at most . | (25) |
We divide into two subcases:
Case (2a): .
In this case we have , and from Equation 23. Since Section 4 implies , Chebyshev’s inequality tells us that except with probability at most (provided is large enough). But then , confirming Equation 25.
Case (2b): .
In this case we have . We now use that heavy have to observe that
| (26) |
(in distribution). We see that , and moreover concentration of Binomials and Equation 22 imply that
| (27) |
provided that is a sufficiently large constant. But we can indeed ensure this by taking sufficient large: by Equation 22, being in Case (2b), and Equation 14, it holds that
| (28) |
At the same time, Equation 23 certainly implies , and Section 4 implies (using Case (2b)). Thus Chebyshev implies
| (29) |
provided is large enough. Combining Equations 27 and 29 yields
| (30) |
which verifies Equation 25 provided is a large enough constant.
We have now verified the properties of claimed in Equations 16 and 25. Next we analyze the random variable that represents the variance of (from QME’s point of view). Our goal will be to show:
| except with probability at most , | (31) | ||||
| except with probability at most . | (32) |
Together with Equations 16 and 25, these facts are sufficient to complete the proof of the theorem, by the QME guarantee of Theorem 5.
We have:
| (33) |
where we’ve defined and . We will be making two different choices for later, but we will always assume
| (34) |
(the implication because and contains only ’s with ). Now since , we have
| (35) | ||||
| (36) | ||||
| (37) |
(where the last inequality used from Equation 14). Using Equation 34 to drop the term of Equation 37 that’s linear in the ’s, we thereby conclude
| (38) | ||||
| (39) |
where . In Case (1) we select , so , and the above bound gives
| (40) |
(provided is large enough). Now Equation 31 follows by Markov’s inequality.
In Case (2) we select , so and we conclude (using obvious notation)
| (41) |
(provided large enough), where we used and also . We now complete the bounding of in Case (2) by two different strategies:
Case (2.i): .
In this case, , and tells us , so we have
| (42) |
Now returning to Equation 37, we get
| (43) | ||||
| (44) |
where we used Equation 42 and . We can again bound the first expression in Equation 44 as . As for the second expression, either (there are no heavy ’s) or else (there is at least one heavy ). In either case, we have , so we can bound this second expression by
| (45) |
where we used sufficiently large (and we could have taken were willing to assume sufficiently large). Putting this bound together with Equation 41 we obtain:
| (46) |
using Equation 25. Equation 32 now follows (in this Case (2.i)) by Markov’s inequality.
Case (2.ii): .
In this case we use a different strategy. Recall from Equation 33 that
| (47) |
By we may assume , the equality because we are in Case (2.ii). Thus
| (48) |
(provided large enough), where we used (from Equation 25) and also . This verifies Equation 32 in Case (2.ii), completing the proof.
References
- [1] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics. Theory of Computing, 9(4):143–252, 2013. doi:10.4086/toc.2013.v009a004.
- [2] Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and the element distinctness problems. J. ACM, 51(4):595–605, July 2004. doi:10.1145/1008731.1008735.
- [3] Jayadev Acharya, Clément L. Canonne, Yanjun Han, Ziteng Sun, and Himanshu Tyagi. Domain compression and its application to randomness-optimal distributed goodness-of-fit. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 3–40. PMLR, July 2020. URL: http://proceedings.mlr.press/v125/acharya20a.html.
- [4] Jayadev Acharya, Constantinos Daskalakis, and Gautam C. Kamath. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems 28, pages 3577–3598. Curran Associates, Inc., 2015.
- [5] Andris Ambainis. Quantum walk algorithm for element distinctness. SIAM Journal on Computing, 37(1):210–239, 2007. doi:10.1137/S0097539705447311.
- [6] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Courtney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry Lyakh, Salvatore Mandrà, Jarrod R. McClean, Matthew McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov, John C. Platt, Chris Quintana, Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, Amit Vainsencher, Benjamin Villalonga, Theodore White, Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, and John M. Martinis. Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510, October 2019. doi:10.1038/s41586-019-1666-5.
- [7] Aleksandrs Belovs. Quantum algorithms for classical probability distributions. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pages 50–59. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.1007/978-3-030-19955-5_5.
- [8] Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J. Bremner, John M. Martinis, and Hartmut Neven. Characterizing quantum supremacy in near-term devices. Nature Physics, 14(6):595–600, April 2018. doi:10.1038/s41567-018-0124-x.
- [9] Gilles Brassard, Peter Høyer, and Alain Tapp. Quantum counting. In Proceedings of the th Annual International Colloquium on Automata, Languages, and Programming (ICALP), pages 820–831. Springer–Verlag, 1998. doi:10.1007/bfb0055105.
- [10] Sergey Bravyi, Aram Harrow, and Avinatan Hassidim. Quantum algorithms for testing properties of distributions. Transactions on Information Theory, 57(6):3971–3981, 2011. doi:10.1109/TIT.2011.2134250.
- [11] Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022. doi:10.1561/0100000114.
- [12] Sourav Chakraborty, Eldar Fischer, Arie Matsliah, and Ronald de Wolf. New Results on Quantum Property Testing. In 30th International Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2010), volume 8 of Leibniz International Proceedings in Informatics (LIPIcs), pages 145–156, Dagstuhl, Germany, 2010. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.FSTTCS.2010.145.
- [13] Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016. IEEE Computer Society, 2016.
- [14] Alison Gibbs and Francis Su. On choosing and bounding probability metrics. International Statistical Rreview, 70(3):419–435, 2002.
- [15] András Gilyén and Tongyang Li. Distributional Property Testing in a Quantum World. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151 of Leibniz International Proceedings in Informatics (LIPIcs), pages 25:1–25:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2020.25.
- [16] Oded Goldreich. The uniform distribution is complete with respect to testing identity to a fixed distribution. Electronic Colloquium on Computational Complexity (ECCC), 23:15, 2016. URL: http://eccc.hpi-web.de/report/2016/015.
- [17] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000.
- [18] Lov Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the th Annual ACM Symposium on the Theory of Computing (STOC), pages 212–219. ACM, New York, 1996. doi:10.1145/237814.237866.
- [19] Lov Grover. A framework for fast quantum mechanical algorithms. In Proceedings of the th Annual ACM Symposium on the Theory of Computing (STOC), pages 53–62. ACM, New York, 1998. doi:10.1145/276698.276712.
- [20] Yassine Hamoudi. Quantum Algorithms for the Monte Carlo Method. PhD thesis, Université de Paris, 2021.
- [21] Stefan Heinrich. Quantum summation with an application to integration. Journal of Complexity, 18(1):1–50, 2002. doi:10.1006/jcom.2001.0629.
- [22] Robin Kothari and Ryan O’Donnell. Mean estimation when you have the source code; or, quantum Monte Carlo methods, pages 1186–1215. Society for Industrial and Applied Mathematics, January 2023. doi:10.1137/1.9781611977554.ch44.
- [23] Samuel Kutin. Quantum lower bound for the collision problem with small range. Theory of Computing, 1(2):29–36, 2005. doi:10.4086/toc.2005.v001a002.
- [24] Jingquan Luo, Qisheng Wang, and Lvzhou Li. Succinct quantum testers for closeness and k-wise uniformity of probability distributions. IEEE Trans. Inf. Theory, 70(7):5092–5103, 2024.
- [25] Ashley Montanaro. Quantum speedup of Monte Carlo methods. Proceedings of the Royal Society A, 471(2181):20150301, 20, 2015. doi:10.1098/rspa.2015.0301.
- [26] Ashwin Nayak and Felix Wu. The quantum query complexity of approximating the median and related statistics. In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, STOC ’99, pages 384–393, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/301250.301349.
- [27] Ryan O’Donnell and John Wright. Learning and testing quantum states via probabilistic combinatorics and representation theory. In Current developments in mathematics 2021, pages 43–94. International Press, Somerville, MA, 2023.
- [28] Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008. doi:10.1109/TIT.2008.928987.
Appendix A Algorithm in the Small Distance Regime
In this appendix, we provide an alternative (and arguably simpler) proof of the main result of [24]:
Theorem 8.
There is a computationally efficient quantum algorithm (Algorithm 2) for uniformity testing with the following guarantees: it takes “samples” from an unknown probability distribution over , and distinguishes with probability at least between (1) , and (2) .
This in turn will follow from the more general result on tolerant closeness testing, where one is given access to the source code for two unknown probability distributions over , and one seeks to distinguish from .
Theorem 9.
There is a computationally efficient quantum algorithm (Algorithm 2) for closeness testing with the following guarantees: it takes “samples” from two unknown probability distributions over , and distinguishes with probability at least between (1) , and (2) .
Theorem 8 can then be obtained as a direct corollary by setting , recalling that when is the uniform distribution , distance and divergence are equivalent:
We emphasize that the result of Theorem 9 itself is not new, as a quantum algorithm achieving the same sample complexity (in the same access model) was recently obtained by [24].555Technically, [24]’s result can be seen as slightly stronger, in that it allows to test vs. , for arbitrarily small constant . However, our algorithm differs significantly from the one in [24], and we believe it to be of independent interest for several reasons:
-
it is conceptually very simple: (classically) hash the domain down to two elements, and use QME to estimate the bias of the resulting Bernoulli;
-
it neatly separates the quantum and classical aspects of the task, only using QME (as a blackbox) in a single step of the algorithm;
-
in contrast to the algorithm of [24], it decouples the use of the source code from and , allowing one to run our algorithm when the accesses to the two distributions are on different machines, locations, or even will be granted at different points in time (i.e., one can run part of the algorithm using the source code for , and, one continent and a year apart, run the remaining part on the now-available source code for without needing anymore).
The idea behind Theorem 9 is relatively simple: previous work (in the classical setting) showed that hashing the domain from to a much smaller could yield sample-optimal testing algorithms in some settings, e.g., when testing under privacy bandwidth, or memory constraints. Indeed, while this “domain compression” reduces the total variation distance by a factor , this shrinkage is, in these settings, balanced by the reduction in domain size. The key insight in our algorithm is then to (1) use this hashing with respect to distance, not total variation distance, and show that one can in this case get a two-sided guarantee in the distance (low-distortion embedding) instead of a one-sided one; and (2) compress the domain all the way to , so that one can then invoke the QME algorithm to simply estimate the bias of a coin to an additive , a task for which a quantum quadratic speedup is well known.
Proof of Theorem 9.
As mentioned above, a key building block of our algorithm is the following “binary hashing lemma,” a simple case of the domain compression primitive of [3]:
Lemma 10 (Random Binary Hashing (Lemma 2.9 and Remark 2.4 of [11]).
Let . Then, for every ,
where is a uniformly random subset of .
Given our goal of tolerant testing, we also require a converse to Lemma 10, stated and proven below:
Lemma 11.
Let . Then, for every ,
where is a uniformly random subset of .
Proof.
As in the proof of Lemma 10, we write and , where for i.i.d. Rademacher. We will use the following fact established in the proof of this lemma, which we reproduce for completeness:
| (49) |
By Markov’s inequality, we then have
concluding the proof. While the above two lemmas allow us to obtain a slightly more general result than in the theorem statement by keeping as free parameters, for concreteness, set and . This implies the following:
-
If , then
-
If , then
where is a uniformly random subset of . This allows us to distinguish between the two cases with only repetitions:
A standard analysis shows that, for a sufficiently large constant, with probability at least the estimate will be within an additive of the corresponding value (either or ), in which case the output is correct. The total number of samples required is times the sample of the Quantum Mean Estimation call on Line 4, which is : the complexity of getting a -additive estimate of the mean of a Bernoulli random variable with high (constant) probability. This concludes the proof.
Appendix B Algorithm in the Giant Distance Regime
In this appendix, we show that, in the (stronger) quantum string oracle model, one can perform tolerant uniformity testing with respect to divergence in the “very large parameter regime,” that is, to distinguish from for :
Theorem 12.
There is a computationally efficient quantum algorithm for uniformity testing with the following guarantees: For , the algorithm makes calls to the quantum string oracle for an unknown distribution over , and distinguishes with probability at least between
| (50) |
where is an absolute constant. Moreover, this query complexity is optimal.
Note that, as discussed in the introduction, this result does not imply anything in terms of total variation distance, as the latter is always at most ; however, we believe this result to be of interest for at least two reasons: (1) it is in itself a reasonable (and often useful) testing question, when total variation distance is not the most relevant distance measure, and implies, for instance, testing from ; and (2) one can show that this complexity is tight, by a reduction to the -to-1 collision problem, which provides additional evidence for Conjecture 2.
Proof.
The main ingredient of the proof is the following lemma, which guarantees that taking from the unknown distribution is enough to obtain (with high constant probability) a multiset of elements with, in one case, no collisions, and in the other at least one collision:
Lemma 13.
For , there exists a constant such that taking i.i.d. samples from an unknown over results in a multiset satisfying the following with probability at least :
-
If , then all elements in are distinct;
-
If , then at least two elements in are identical;
as long as . (In particular, taking suffices to ensure such a choice of is possible.)
Before proving this lemma, we describe how it implies our stated complexity upper bound. Lemma 13 guarantees that we can reduce our testing problem to that of deciding if, given oracle access to a string of size , whether all the elements in it are distinct. This problem is solved by Ambainis’ element distinctness quantum-walk algorithm [5] using quantum queries.
Proof of Lemma 13.
Suppose we take i.i.d. samples from , and count the number of collisions among them:
Letting and for all integer and vector (so that for all ), we have, , and
Now, it is not hard to verify that , and
| (51) |
From this, we get, setting :
-
If , then , and as long as we have , so that by Markov’s inequality
This proves the lemma. This concludes the proof of the upper bound part of Theorem 12. To conclude, it only remains to show that this is, indeed, optimal. For this, we need a lower bound of [23], which generalized a lower bound of Aaronson and Shi [2]:
Theorem 14 ([23]).
Let and be integers such that , and let be a function to which we have quantum oracle access. Then deciding if is 1-to-1 or -to-1, promised that one of these holds, requires quantum queries.
When we view this function as a quantum string oracle for a probability distribution, the function being 1-to-1 corresponds to the uniform distribution on . In the other case, the distribution is uniform on a subset of size , for any dividing . An easy calculation shows that the second distribution is at divergence
| (52) |
from uniform, which completes the proof.
Appendix C Reduction from Identity to Uniformity Testing
As mentioned in the introduction, there is a known reduction from identity to uniformity testing, due to Goldreich [16] and inspired by [13]: which, in a blackbox way, converts an instance of uniformity testing (in total variation distance) with reference distribution over and distance parameter to an instance of uniformity testing over and distance parameter . (Here, we follow the exposition and parameter setting of [11, Section 2.2.3].)
To be able to use it in our setting, all we need to check is that this blackbox reduction preserves access to “the code”: that is, given the code for a probability distribution over , that we can efficiently have access to the code for the resulting distribution over . To do so, note that is the composition of 3 successive mappings,
where , , and , . So it suffices to show that each of these 3 mappings does preserve access to the code generating a sample from the resulting distribution.
-
The first, , is the easier, as it consists only in mixing its input with the uniform distribution:
for which a circuit can be easily obtained, given a circuit for .
-
The second, , “rounds down” the probability of each of the elements of the domain, and sends the remaining probability mass to a -th new element:
This corresponds to adding to the circuit for a “postprocessing circuit” which, if the output of is , outputs with probability (and otherwise).
-
The third, , assumes that the reference distribution is “grained” (namely, all its probabilities are positive multiples of ), which will be the case after the first two mappings666Specifically, when chaining the three mappings, the reference distribution called here is actually . fully known). Having partitioned in sets where
and is given by
This corresponds to adding to the circuit for a “postprocessing circuit” which, if the output of is , outputs an element of uniformly at random. (Importantly, are uniquely determined by , and do not depend on or at all.)
To summarize, each of these three mappings can be implemented to provide, given a circuit for , a circuit for the output , so that altogether the reduction can be implemented in a way which preserves access to “the code.”
