Abstract 1 Introduction 2 Preliminaries 3 Polynomials and fractional block sensitivity 4 Random restrictions for polynomials 5 Random projections for quantum query algorithms 6 Oracle Separation for 𝗤𝗖𝗠𝗔 Hierarchy References

Oracle Separations for the Quantum-Classical Polynomial Hierarchy

Avantika Agarwal Institute for Quantum Computing, University of Waterloo, Canada Shalev Ben-David ORCID Institute for Quantum Computing, University of Waterloo, Canada
Abstract

We study the quantum-classical polynomial hierarchy, 𝖰𝖢𝖯𝖧, which is the class of languages solvable by a constant number of alternating classical quantifiers followed by a quantum verifier. Our main result is that 𝖰𝖢𝖯𝖧 is infinite relative to a random oracle (previously, this was not even known relative to any oracle). We further prove that higher levels of 𝖯𝖧 are not contained in lower levels of 𝖰𝖢𝖯𝖧 relative to a random oracle; this is a strengthening of the somewhat recent result that 𝖯𝖧 is infinite relative to a random oracle (Rossman, Servedio, and Tan 2016).

The oracle separation requires lower bounding a certain type of low-depth alternating circuit with some quantum gates. To establish this, we give a new switching lemma for quantum algorithms which may be of independent interest. Our lemma says that for any d, if we apply a random restriction to a function f with quantum query complexity Q(f)n1/3, the restricted function becomes exponentially close (in terms of d) to a depth-d decision tree. Our switching lemma works even in a “worst-case” sense, in that only the indices to be restricted are random; the values they are restricted to are chosen adversarially. Moreover, the switching lemma also works for polynomial degree in place of quantum query complexity.

Keywords and phrases:
Switching Lemma, Polynomial Hierarchy, Approximate Degree, Random Oracles, Query Complexity, Quantum Computing
Copyright and License:
[Uncaptioned image] © Avantika Agarwal and Shalev Ben-David; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Oracles and decision trees
; Theory of computation Complexity classes ; Theory of computation Quantum complexity theory
Related Version:
Full Version: https://arxiv.org/abs/2410.19062 [2]
Funding:
This research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC), DGECR-2019-00027 and RGPIN-2019-04804.111Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG), DGECR-2019-00027 et RGPIN-2019-04804.
Editor:
Shubhangi Saraf

1 Introduction

In classical complexity theory, an important complexity class is the polynomial hierarchy, 𝖯𝖧. This is a generalization of 𝖭𝖯 to higher depth: it can be written as the union 𝖭𝖯𝖭𝖯𝖭𝖯𝖭𝖯𝖭𝖯𝖭𝖯, and corresponds to languages that can be computed using a constant number of alternating quantifiers over certificates. A problem is in 𝖯𝖧 if it can be computed in polynomial time in the presence of two computationally-unbounded provers, one of which wants to convince the verifier the input is a yes-input, and one of which wants to convince the verifier the input is a no-input, with the provers exchanging polynomially-sized public messages for a constant number of rounds. The polynomial hierarchy can therefore be viewed as the class of problems solvable by an audience member sitting in a debate between experts, where the debate has the inefficient format of alternating between the two speakers only a constant number of times, even though the total debate time is polynomial in the input size (polynomially many alternations would result in the larger class 𝖯𝖲𝖯𝖠𝖢𝖤).

The polynomial hierarchy can be viewed as a union of different “levels”, where the d-th level corresponds to debates with d alternations. The hierarchy is said to collapse to level d if level d+1 can solve no more problems than level d; in that case, all higher levels can also be simulated with d rounds of debate. It is widely believed that the polynomial hierarchy is infinite, meaning it does not collapse to any level. This is a generalization of the 𝖯𝖭𝖯 conjecture, which is equivalent to the assertion that 𝖯𝖧 does not collapse to the 0-th level.

Since proving the polynomial hierarchy is infinite is beyond current techniques, one may ask instead for oracle separations. It turns out 𝖯𝖧 is infinite even relative to a random choice of oracle (with probability 1), though this result is somewhat recent.

Theorem 1 ([13]).

𝖯𝖧 is infinite relative to a random oracle.

We study a quantum version of the polynomial hierarchy known as the quantum-classical hierarchy (introduced in [11]). This is the class of problems solvable by a quantum audience member of a constant-round debate (with classical messages) between experts. This class, denoted 𝖰𝖢𝖯𝖧, generalizes 𝖰𝖢𝖬𝖠 but not 𝖰𝖬𝖠 (since the proofs received are classical rather than quantum). More formally, 𝖰𝖢𝖯𝖧 is the union i𝖰𝖢Σi, where 𝖰𝖢Σi is defined as the class of languages for which there is a quantum verifier V satisfying

xAyesy1y2Qiyi:Pr[V(x,y1,y2,,yi)]2/3
xAnoy1y2Qi¯yi:Pr[V(x,y1,y2,,yi)]1/3

for all input strings x, where the yj represent polynomially-sized classical strings, Qi is a quantifier, and Qi¯ is the opposite quantifier.

The class 𝖰𝖢𝖯𝖧 is interesting on its own, but another motivation for its study is the connection to quantum switching lemmas. Oracle separations for 𝖯𝖧 generally reduce to the problem of giving lower bounds on the classical circuit class 𝖠𝖢0, consisting of circuits of constant depth. Quantum versions of constant-depth circuits are of interest because they help model quantum devices with many qubits but few layers of gates. Lower bounds on such circuit classes are often shown using switching lemmas, which assert that certain types of functions must greatly simplify under a random restriction of their bits; these switching lemmas can therefore be useful both for the study of near-term quantum devices and for oracle separations for complexity classes such as 𝖰𝖢𝖯𝖧.

1.1 Our results

Our main result generalizes Theorem 1 to the quantum setting.

Theorem 2.

The following holds relative to a random oracle with probability 1. For any constant d1, level d+1 of the polynomial hierarchy, Σd+1, is not contained in level d of the quantum-classical hierarchy, 𝖰𝖢Πd. In particular, 𝖰𝖢𝖯𝖧 is infinite, and no fixed level of the 𝖰𝖢𝖯𝖧 hierarchy contains all of 𝖯𝖧.

Previously to this work, it was not even known whether there was any oracle relative to which 𝖰𝖢𝖯𝖧 is infinite, let alone a random oracle. Previous switching lemmas for quantum algorithms, such as the one in [1], are insufficient to prove such an oracle separation.222Most quantum circuit lower bounds are incomparable to ours, since they consider different models of quantum circuits. Our switching lemma is stronger than the comparable switching lemma of [1]. We also note that this theorem implies Theorem 1 as a special case, since it also implies that 𝖯𝖧 is infinite relative to a random oracle.

The study of 𝖯𝖧 relative to a random oracle boils down to the study of 𝖠𝖢0 circuits against the uniform distribution. This, in turn, is usually done using random restrictions and the switching lemma, together with the random projection technique introduced in [13]. To prove Theorem 2, we need a new switching lemma for quantum query complexity.

Theorem 3.

Let N be sufficiently large, and let f be a possibly partial Boolean function on N bits with Q(f)N1/3. Let p=N10/11 (we can choose pO(1Q(f)2N)4/7), and consider a random restriction of f which fixes each bit with probability 1p. Then for any dN1/3, this restriction is approximated by a decision tree of height d in the following sense.

For an input x and a choice of unrestricted bits S, we define a restriction which sets bits in S¯ to according to x. Then for every input x, there is an ensemble of decision trees {Dx,S}x{0,1}n,S[N], all of height at most d, with Dx,S acting on the unrestricted input bits which are strings of length |S|, such that the following holds: if S[N] is chosen at random with each index i[N] included in S independently with probability p, then

x,y{0,1}nPrS[Dx,S(y|S)fxS¯(y|S)]<ed1/5.

Here fxS¯ denotes the restriction of f to the partial assignment which fixes the bits of x outside of S.333If fxS¯ is undefined on input y|S, we count it as equality holding in the probability, meaning that on inputs outside the domain the decision tree is allowed to output anything.

This is a type of switching lemma for quantum query complexity, though its statement can be confusing. We make a few clarifying comments. First, note that with the parameter p=N10/11, a function f with quantum query complexity at most N1/3 becomes constant with high probability under a random restriction. However, the probability of the function becoming constant is merely 11/poly(N), which is not small enough. The point of the theorem is to approximate the function f to exponentially small error (in the height of the approximating decision tree).

The random restriction in Theorem 3 is worst-case in the sense that while the positions to be fixed are chosen randomly, the bits to which those positions are fixed are specified by a string x, which can be chosen arbitrarily. Moreover, the resulting restricted function f|xS¯ must be approximated by a decision tree even on worst-case choices of input y|S (except that the choice of the worst-case y cannot depend on the random choice of S). From such a worst-case statement, we can easily derive a more familiar average-case switching lemma.

Corollary 4.

For sufficiently large N, let f be such that Q(f)N1/3, let p=N10/11, and let dN1/3. If fρ is a random restriction in which each bit remains unfixed with probability p (and is fixed randomly to 0 or 1 otherwise), then with probability at least 1ed1/10, there is a decision tree of height at most d which computes fρ on a 1ed1/10 fraction of its inputs.

Theorem 3 can be viewed as a version of Corollary 4 which works even for non-uniform distributions. We also strengthen Theorem 3 so that it works for approximate degree instead of just quantum query complexity.

Theorem 5.

Theorem 3 still holds if the condition on f is replaced by deg~(f)N1/3 instead of Q(f)N1/3.

This can be viewed as establishing a random-restriction version of the Aaronson-Ambainis conjecture, which asserts that low degree polynomials can be approximated by shallow decision trees against the uniform distribution. Our version works only after a random restriction is applied to the polynomial, but it works with extremely strong parameters444Note that Aaronson-Ambainis conjecture is about bounded real-valued functions rather than Boolean functions; our switching lemma also works in that setting.. Another (incomparable) version of the Aaronson-Ambainis conjecture for random restrictions was given in [7].

Although we don’t explicitly show it, Theorem 5 can be strengthened further, so that it works with a measure known in the literature as (the square root of) “critical fractional block sensitivity”, which lower bounds approximate degree [4]. This measure also lower bounds the positive-weight quantum adversary bound, so our switching lemma also works for that measure.

Finally, we give an application of our switching lemmas to yet another type of oracle separation: we show that the “𝖰𝖢𝖬𝖠 hierarcy,” that is,

𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠

is also infinite relative to a random oracle. Explicitly, defining 𝖰𝖢𝖬𝖠𝖧1=𝖰𝖢𝖬𝖠 and 𝖰𝖢𝖬𝖠𝖧d+1=𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖧d for d1, we have the following result.

Theorem 6.

The following holds relative to a random oracle with probability 1. For any constant d1, 𝖰𝖢𝖬𝖠𝖧d+1 is not contained in 𝖰𝖢𝖬𝖠𝖧d. Moreover, no fixed level 𝖰𝖢𝖬𝖠𝖧d contains all of 𝖯𝖧.

Proving lower bounds on 𝖰𝖢𝖬𝖠𝖧 relative to an oracle amounts to proving lower bounds on constant-depth circuits which have alternating layers of polynomial-fanin AND gates, and of “gates” consisting of quantum query algorithms which make few quantum queries. This demonstrates that our techniques can be useful for proving lower bounds on shallow quantum circuit classes.

1.2 Our techniques

Random oracle separations.

To prove our oracle separation for 𝖰𝖢𝖯𝖧 relative to a random oracle, it suffices to show that depth d 𝖠𝖢0 circuits with “quantum query complexity gates” at the bottom layer cannot compute some function in computable in depth d+2 classical 𝖠𝖢0, against the uniform distribution. The quantum query complexity gates means that at the bottom of the d alternating layers of AND and OR gates lie Boolean functions which can each be computed in polylogN quantum queries. (For the 𝖰𝖢𝖬𝖠𝖧 separation, we need to allow quantum query complexity gates in the middle of the circuit as well, not just the bottom.) This circuit separation implies the random oracle separation through a standard technique sometimes called “slow diagonalization” (see full version).

To construct a function which is computable in depth d+2 classically but not in depth d quantumly (against the uniform distribution), we use the construction of [13], which separated classical 𝖠𝖢0 circuits of different depths. Their proof used random projections, which we also employ. We must modify their construction to account for the extra layer of quantum gates, which we will need to “strip away” in the analysis using a random projection and our quantum switching lemma.

In order to establish a depth-hierarchy theorem for circuits, the restrictions used need to ensure that an AND-OR tree retains structure (as in [12] and [13]) while a smaller-depth circuit simplifies. Thus the sequence of restrictions/projections used needs to alternate between heavily biased towards 1 and heavily biased towards 0 (so that not all AND/OR gates are set to constant). In the quantum switching lemma of [1], the quantum query algorithm becomes close to a DNF with respect to the uniform distribution, when the restriction is also sampled uniformly. We will however need to sample a projection from an underlying distribution different from the one with respect to which we compare the closeness of the resulting quantum query algorithm and DNF (because of the alternating choice of distributions described earlier). This mismatch between the two distributions requires us to use our “worst-case” type of quantum switching lemma.

Our separation for the 𝖰𝖢𝖬𝖠𝖧 hierarchy works similarly, but requires carefully changing the parameters to ensure the structure is maintained even in the presence of intermediate quantum query gates. See Section 6 for details.

Quantum switching lemma.

The proof of our quantum switching lemma relies on an adaptive application of the quantum hybrid method, which may be of independent interest. Given a quantum algorithm Q acting on a string x, a standard hybrid argument of [6] says there will only be a small set of positions in the string x that the algorithm Q “looks at”; the output of Q can only be sensitive to a change in a bit of x at one of those few positions. Call those the heavy positions of x. Now, if some of those heavy positions of x are indeed changed, then not only can the output of Q change, but even the set of heavy positions of the input can change.

This poses a problem for us: we would like to restrict the function to the bits of x, except at a few random positions. If some of those random positions are heavy (with respect to x), then the quantum algorithm can still depend on them in a nontrivial manner. We could try to mimic this by a classical algorithm which queries those few non-fixed heavy bits, but the problem is that this is not sufficient to fix the output of Q: the output of the algorithm Q may now depend on new heavy bits. (It is not clear if a classical decision tree can find these new heavy bits, since a quantum algorithm may try to query the rest of the unfixed bits in superposition.)

We get around this problem by applying the hybrid method iteratively, in an adaptive manner. Beginning with a string x, we find its heavy bits, and randomly choose a few of them to leave unfixed; we replace those bits with values from a different string y. This gives us a hybrid string x(1) which mostly contains bits of x, but where a few heavy positions were replaced by bits of y. We then iterate this process: we find the heavy bits of x(1) which are not heavy bits of x, sample a few of those positions at random, and replace them with more bits from y, resulting in x(2). We continue this way and terminate when there are no new heavy bits.

In each round, there is a constant (or better) probability of having no new heavy bits which are unfixed, since the number of heavy bits is small and the probability of leaving a bit unfixed is also small. This means the number of rounds cannot be too large except with exponentially small probability. Once the process terminates, the set of all heavy bits encountered along the way cannot be too large (except with exponentially small probability), and can be used to compute the randomly-restricted function on most inputs.

Polynomial lower bound.

Since our quantum switching lemma relies fundamentally on the hybrid method, which talks about “where the quantum algorithm queried the string”, it may seem surprising that we can generalize our result to a switching lemma for polynomials as well. To do this, we rely on a property shown in [4] (expanding on earlier work by [14]) which says that approximate degree is lower bounded by fractional block sensitivity. Fractional block sensitivity, in turn, is equal to fractional certificate complexity, a measure which assigns to each bit of each input a non-negative weight: a way of saying “how much did the polynomial look at position i when given string x as input”.

In fact, we will need a stronger version of the result of [4]; we adapt their method to prove this. Our result is the following theorem, which may be of independent interest.

Theorem 7.

Let f:{0,1}n[0,1] be a real-valued function which can be expressed as a polynomial of degree at most d. Then there is an assignment of weights {cx,i}x{0,1}n,i[n] to inputs x and positions i[n] such that for all x,y{0,1}n, we have

i:xiyicx,i|f(x)f(y)|,i=1ncx,iπ24d2.

The proof is similar to the one in [4]; it uses strong duality of linear programming to convert fractional certificates to fractional block sensitivity, and uses composition of f with a version of promise-OR to convert fractional block sensitivity to regular block sensitivity; the latter can be turned into sensitivity using a projection, and results in approximation theory can be used to relate sensitivity to polynomial degree.

2 Preliminaries

Definition 8 (Projection).

Given a function f:{0,1}n×lR, and a restriction ρ in {0,1,}n×l, the projected function 𝗉𝗋𝗈𝗃ρf:{0,1}nR is defined as 𝗉𝗋𝗈𝗃ρf(x)=f(y) where

yi,j={xiif ρ(i,j)=ρ(i,j)otherwise

So the projection operator maps all unrestricted variables in a given block of size l, to the same new variable. If ρ is a random restriction, then 𝗉𝗋𝗈𝗃ρ is a random projection.

2.1 Quantum-Classical Polynomial Hierarchy

Definition 9 (𝖰𝖢Σi).

Let A=(Ayes,Ano) be a promise problem. We say that A is in 𝖰𝖢Σi(c,s) for poly-time computable functions c,s:[0,1] if there exists a poly-bounded function p: and a poly-time uniform family of quantum circuits {Vn}n such that for every n-bit input x, Vn takes in classical proofs y1{0,1}p(n),,yi{0,1}p(n) and outputs a single qubit, such that:

  • Completeness: xAyes y1y2Qiyi s.t. Pr[Vn accepts (y1,,yi)]c.

  • Soundness: xAno y1y2Q¯iyi s.t. Pr[Vn accepts (y1,,yi)]s.

Here, Qi equals when i is odd and equals otherwise and Q¯i is the complementary quantifier to Qi. Define

𝖰𝖢Σi:=csΩ(1/poly(n))𝖰𝖢Σi(c,s).

Note that the first level of this hierarchy corresponds to 𝖰𝖢𝖬𝖠. The complement of the ith level of the hierarchy, 𝖰𝖢Σi, is the class 𝖰𝖢Πi defined next.

Definition 10 (𝖰𝖢Πi).

Let A=(Ayes,Ano) be a promise problem. We say that A𝖰𝖢Πi(c,s) for poly-time computable functions c,s:[0,1] if there exists a polynomially bounded function p: and a poly-time uniform family of quantum circuits {Vn}n such that for every n-bit input x, Vn takes in classical proofs y1{0,1}p(n),,yi{0,1}p(n) and outputs a single qubit, such that:

  • Completeness: xAyes y1y2Qiyi s.t. Pr[Vn accepts (y1,,yi)]c.

  • Soundness: xAno y1y2Q¯iyi s.t. s.t. Pr[Vn accepts (y1,,yi)]s.

Here, Qi equals when i is odd and equals otherwise, and Q¯i is the complementary quantifier to Qi. Define

𝖰𝖢Πi:=csΩ(1/poly(n))𝖰𝖢Πi(c,s).

Now the corresponding quantum-classical polynomial hierarchy is defined as follows.

Definition 11 (Quantum-Classical Polynomial Hierarchy (𝖰𝖢𝖯𝖧)).
𝖰𝖢𝖯𝖧=i𝖰𝖢Σi=i𝖰𝖢Πi.

See the full version for a survey of the previous work on quantum polynomial hierarchies.

Definition 12 (𝖰𝖠𝖢i0(s) circuits).

A circuit family {Cn}n=1 is in 𝖰𝖠𝖢i0(s) for some function s: if it satisfies the following:

  1. 1.

    Cn has i alternating layers of and gates, followed by a layer of quantum query circuits of size polylog(s(n))n

  2. 2.

    size(Cn) s(n)n where size of the circuit is the number of , gates plus the number of quantum query circuits in the bottom layer

Note that in the above definition, if the quantum query circuits are not computing total functions, then we treat each gate in the circuit as computing a partial function, whose output is evaluated in the following manner. For an input x to the circuit, a gate outputs 1 if at least one of its inputs is 1 regardless of the arbitrary 0/1 responses of the input partial functions whose promise is violated. It outputs 0 if all of its inputs are 0 regardless of the arbitrary 0/1 responses of the input partial functions whose promise is violated. Otherwise, the output of the gate is undefined. The evaluation of the output of gate is done in a similar manner. This method of evaluation gives the same output as the one if we evaluate the output of the circuit in a manner consistent with Definition 9 (or Definition 10). That is, a gate corresponds to an quantifier and a gate corresponds to a quantifier. We show this formally in the full version.

2.2 Query complexity

In query complexity, an algorithm (classical or quantum) is given black-box access to a string x{0,1}N, and is supposed to compute some boolean function f(x) by querying x as few times as possible. We have included the basic definitions of query complexity in the full version. Interested readers can refer to [8] for a detailed introduction to query complexity. Let N=2n. In the classical query model, an algorithm queries i{0,1}n and receives xi, which we will denote by Ox(i)=xi. In the quantum query model, an algorithm queries i{0,1}nαi|i and receives i{0,1}nαi(1)xi|i, which we will denote by Ox(i{0,1}nαi|i)=i{0,1}nαi(1)xi|i.

Definition 13 (Query Magnitudes [6]).

Given a quantum query algorithm Q computing a (partial) function f:{0,1}N{0,1,} (where N=2n) with T queries, define query magnitude at time step t, denoted mit(x), for t[T] and i{0,1}n as follows:

Q(x) =UT+1OxUTOxU1|0n|0w
mit(x) =Pr[Measuring query register of Ut+1OxUtOxU1|0n|0w gives i]
mi(x) =t=1Tmit(x)

We call mi(x) as the query magnitude of Q for bit i on input x. Since i{0,1}nmit(x)=1 for all t[T], we know that i{0,1}nmi(x)T (it need not be exactly equal since the quantum algorithm might make less than T queries on some inputs x).

Definition 14 (Partial Assignment).

A partial assignment p of length N is any string from the set {0,1,}N. A string x{0,1,}N is consistent with p if pi=xi whenever pi.

Definition 15 (Certificate).

Given a (partial) function f:{0,1}N{0,1,}, a partial assignment p{0,1,}N is a b-certificate for f if f(x)=f(y)=b for all x,yf1({0,1}) which are consistent with p. The length of this certificate p is the number of bits in p not equal to .

Definition 16 (Certificate Complexity).

Given a (partial) function f:{0,1}N{0,1,}, the certificate complexity of f at input x𝖽𝗈𝗆(f) is defined as C(f,x)=minp𝗅𝖾𝗇(p) where the minimization is over all partial assignments p consistent with x, which are also f(x)-certificates for f. Then the certificate complexity C(f)=maxx𝖽𝗈𝗆(f)C(f,x).

Definition 17 (Approximate Degree).

Given a (partial) function f:{0,1}N{0,1,}, the approximate degree of f, denoted 𝖽𝖾𝗀~(f), is the minimum degree of a multi-linear polynomial p which approximate f to error within additive error 1/3 for all input x𝖽𝗈𝗆(f) and p(x)[0,1] for all x{0,1}N.

Lemma 18 ([5]).

If a (partial) function f:{0,1}N{0,1,} is computable by a quantum algorithm Q making T queries, we have that there is a multilinear polynomial p of degree at most 2T such that p(x) is equal to the probability that Q outputs 1 on input x. Therefore p approximates f to error within additive error 1/3 for all input x𝖽𝗈𝗆(f) and p(x)[0,1] for all x{0,1}N and 𝖽𝖾𝗀~(f)2T.

2.3 𝗤𝗖𝗠𝗔 Hierarchy

In this section, we define the 𝖰𝖢𝖬𝖠 hierarchy (called 𝖰𝖢𝖬𝖠𝖧). It is a hierarchy of classes, where the ith level, called 𝖰𝖢𝖬𝖠𝖧i is defined as follows:

𝖰𝖢𝖬𝖠𝖧1 :=𝖰𝖢𝖬𝖠
𝖰𝖢𝖬𝖠𝖧i :=𝖰𝖢𝖬𝖠𝖰𝖢𝖬𝖠𝖧i1
𝖰𝖢𝖬𝖠𝖧 :=i=1𝖰𝖢𝖬𝖠𝖧i

Note that 𝖰𝖢𝖬𝖠 is a class of promise problems, thus when querying a 𝖰𝖢𝖬𝖠𝖧i oracle, any 𝖰𝖢𝖬𝖠 algorithm V is making queries to a promise oracle. Note that the 𝖰𝖢𝖬𝖠 algorithm V is different from the 𝖰𝖢𝖬𝖠 verifier (which takes a single proof as input and outputs 0/1), here V instead refers to the OR of all these 0/1 outputs of the verifier for every possible proof. If V makes a query on an input which violates the promise of the 𝖰𝖢𝖬𝖠𝖧i oracle, then the oracle can return an arbitrary 0/1 response. Therefore we need to be careful about when the behaviour of V is well-defined. In this paper, we follow the definition of well-defined behaviour proposed in [1] (note however that there can be many other reasonable definitions of well-defined behaviour, though we do not describe them here). The behaviour of V is well-defined, if the output of V remains the same regardless of the arbitrary 0/1 responses of the 𝖰𝖢𝖬𝖠𝖧i oracle on inputs violating the promise. In particular for a yes input, the 𝖰𝖢𝖬𝖠 verifier might accept only one proof for some choice of oracle responses outside the promise, and it might accept (say) half of the proofs for some other choice of oracle responses, but the behaviour remains well-defined as long as it accepts at least one proof for every choice of oracle responses.
In the rest of the paper, we will denote as 𝖰𝖢𝖬𝖠𝖧i verifier V as an i-tuple of 𝖰𝖢𝖬𝖠 verifiers V=V1,,Vi where V1 corresponds to the base 𝖰𝖢𝖬𝖠 verifier and the remaining i1 verifiers form the (i1)-tuple for the verifier corresponding to the 𝖰𝖢𝖬𝖠𝖧i1 oracle.

3 Polynomials and fractional block sensitivity

In this section, we establish a relationship between fractional block sensitivity and degree for real-valued functions. We will need this relationship to establish a switching lemma for approximate degree (the tools of this section are not necessary for the quantum switching lemma, which may instead rely on only the hybrid argument of [6], but we opt to prove the stronger switching lemma for polynomials).

3.1 Definitions for real-valued functions

When studying the degree of bounded real-valued functions, most of the literature uses the convention that the functions have signature {±1}n[1,1], known as the ±1 basis. Switching between {0,1} and {±1} does not affect most measures: we can interpret 1 as 1 and +1 as 0, and we can plug in 12xi in each variable to convert a {0,1} variable to a {±1} variable, or plug in (1xi)/2 to go in the reverse direction. This does not affect the degree. We will use the {±1} basis in this section.

Sensitivity.

The sensitivity of a real-valued function can be defined as follows. For a block B[n], define

s(f,x,B)|f(x)f(xB)|2,

where the notation xB refers to the string x with the block B[n] of bits flipped. Note that we divide by 2 because f can take values from [1,1] instead of [0,1]. This is the sensitivity of a specific block B to a specific input x{±1}n, with respect to the function f; it is a value between 0 and 1. We next define the sensitivity of f at x to be s(f,x) which is the total sensitivity of all the bits of x. We then use it to define sensitivity of f.

s(f,x) i=1ns(f,x,{i})
s(f) maxx{±1}ns(f,x)

For a Boolean function, this definition matches the usual definition of sensitivity s(f).

Block sensitivity.

We define block sensitivity analogously: bs(f,x) will be the maximum possible value of j=1ks(f,x,Bj) over choices of disjoint blocks Bj[n], and bs(f) will be the maximum possible value of bs(f,x) over choices of inputs x{±1}n.

Fractional block sensitivity.

We define fbs(f,x) to be the maximum possible value of the sum B[n]wBs(f,x,B), subject to the constraints wB0 for all B[n] and BiwB1 for all i[n]. (The weights wB represent fractions of a block, and the total weight of all blocks containing a bit i must be at most 1, making the blocks “fractionally disjoint”.) As usual, we define fbs(f)maxx{±1}nfbs(f,x).

Fractional certificate complexity.

The definition of fbs(f,x) is the optimal value of a linear maximization program in terms of the real variables wB. The dual of this linear program can is as follows. The variables are ci0, with the constraint iBcis(f,x,B) for all B[n]. The objective is to minimize i[n]ci. This can be interpreted as finding a fractional certificate, with ci representing the fraction with which bit i is used in the certificate, such that any block B which, when flipped, changes the value of the function by s(f,x,B) must be “detected” by the certificate in the sense that the certificate puts total weight at least s(f,x,B) on the bits of B.

Fractional certificate complexity is equal to fractional block sensitivity, so we will not give it new notation. However, we denoting by cx,i the fractional certificate for x, we get that for all x,y,

i:xiyicx,i|f(x)f(y)|2,i=1ncx,ifbs(f).

The factor of 2 comes from the use of [1,1] outputs for f; if f takes outputs in [0,1] instead (for example, if we consider the function f(x)=(1g(x))/2 where g takes values in [1,1] we are comparing the differences |f(x)f(y)| to the value of fbs(g)), we do not need to divide by 2.

3.2 Relationships to degree

Our results will rely on the following theorem by [9], which states that sensitivity lower bounds the squared degree even for a real-valued function. This theorem follows from results in approximation theory (though it is possible to get a version which is weaker by a constant factor using a proof analogous to the one for the discrete case).

Theorem 19 ([9]).

For any f:{±1}n[1,1], we have s(f)deg(f)2.

With this theorem in hand, we show in Corollary 20 that block sensitivity also lower bounds the squared degree (proven in full version).

Corollary 20.

For any f:{±1}n[1,1], we have bs(f)deg(f)2.

Finally, we show in Theorem 21 that fractional block sensitivity lower bounds the squared degree (proven in full version).

Theorem 21.

For any f:{±1}n[1,1], we have fbs(f)π24deg(f)2.

Theorem 7 follows.

4 Random restrictions for polynomials

In this section, we show that functions which have low fractional block sensitivity (and therefore, low approximate degree and low quantum query complexity) become close to a small-depth decision tree with high probability, after applying a sufficiently strong random restriction. This resolves an open problem in [1], asking whether functions with low approximate degree become close to a DNF on applying a random restriction. We first define a notion of f(x)-certificates for a function F:{0,1}N{0,1,}, which is approximated to error 1/3 by a real-valued function f.

Definition 22 (f(x)-certificate).

Let F:{0,1}N{0,1,} be a partial function which is approximated to error 1/3 by a real-valued function f. Then a pair (K,x) for K[N] is a 1-certificate for F on input x{0,1}N if for all y{0,1}N such that yi=xi for all iK, f(y)>1/2. In particular, the bits of x in K certify that F(x)0. Similarly, a pair (K,x) for K[N] is a 0-certificate for f on input x{0,1}N if for all y{0,1}N such that yi=xi for all iK, f(y)1/2.

A random restriction result was shown by [1] for quantum query algorithms, which they use to show that efficient QMA-query algorithms become close to a DNF in expectation over the choice of a suitably strong random restriction. We reprove this result (with minor change in parameters) for functions with low fractional block sensitivity in Lemma 23 (proof in full version). The proof is analogous to that of [1], with query magnitudes of a quantum algorithm replaced by the weight assignment for fractional certificate complexity.

Lemma 23 (Analogue of Theorem 60 of [1]).

Let f:{0,1}N{0,1,} be a partial function which is approximated to error 1/3 by a real-valued function f~ of fractional block-sensitivity F. Pick a set S[N] where each index i[N] is put in S independently with probability p. Choose an arbitrary k and x{0,1}N. Define τ=2p3/4Fk and K={iS:cx,i>τ}, cx,i is the fractional certificate of f~ on index i when the input is x. Then with probability at least 12ek/6 over choice of S, |K|k, |S|2pN and for all y{0,1}N with {i[N]:xiyi}SK, we have:

|f~(x)f~(y)|4p7/4FNk

Note that for Lemma 23, we have that |K|1 with probability at most k2p1/4 by a union bound. In particular, if k=polylog(N) and p is sufficiently small (say 1/N3/4), then with reasonably high probability over the choice of S (say 1/N2/3), none of the indices i such that cx,i>τ are included in S. Therefore, the restricted function becomes a constant function in this case.

Theorem 24.

Let f:{0,1}N{0,1,} be a partial function which is approximated to error 1/3 by a real-valued function f~ of fractional block sensitivity F. Pick a set S[N] where each index i[N] is put in S independently with probability p(k48FN)4/7, where k. Then

x,y{0,1}NPrS[g(y|S)=fρ(y|S)]1(2+k6)ek/6

where y|S is the string y restricted to indices in S, ρ is a p-random restriction defined from x and S as follows:

ρ(i)={if iSxiotherwise

and g is a width-k2 DNF dependent on ρ (defined explicitly in the proof). Note that if fρ(y|S)=, then g is allowed to output 0 or 1 arbitrarily.

Proof.

Choose arbitrary x,y{0,1}N, and choose S as described in the theorem statement to define ρ. Suppose f is approximated to error 1/3 by a real-valued function f~ of fractional block sensitivity F. Let C be the set of all 1-certificates for f~ after applying the restriction ρ. Then the DNF g is defined as follows (this is the same DNF that [1] show closeness to):

g(y)=(Kx,x)C|Kx|k2iKxyi=xi

Note that if fρ(y|S)=0, then g outputs 0 because it will not find a 1-certificate in y|S. In addition, we do not care about the output of g when fρ(y|S)=. So now we only worry about the case when fρ(y|S)=1. Define h(x)={i[N]:cx,i>τ}, where τ=2p3/4Fk and cx,i is the fractional weight of f~ on index i for input x. We now think of sampling S in stages (instead of all at once), and we start by sampling S0 for indices in h(x). Note that the set S is still sampled by putting each bit i in S independently with probability p, we only adopt this viewpoint of stages to analyze the sampling. In particular, since each bit is included in S independently, we can analyze the random restriction by looking at a subset of bits at a time and seeing whether they were included in S or not, without affecting the inclusion/exclusion of other bits, which we can analyze in the next step. Recall, we start by sampling S0 for indices in h(x). Define A0=, A1={iS0:xiyi} and xAi as the string x with bits in Ai flipped. Then S is sampled further in stage j as follows: Sj is obtained by sampling indices in h(xAj)lj1h(xAl), and Aj+1={iljSl:xiyi}. Note that j|h(xAj)|Fτ, and therefore by a union bound, |Sj|1 with probability at most pFτ=k2p1/4<1e over choice of Sj (we assume that k is sufficiently small, say k<N1/8). If |Sj|=0, then we stop the stage-wise analysis and make a decision for all the input bits we haven’t analyzed yet to get the full set S. Therefore, the probability that we reach stage k6 of the sampling to sample Sk6 is at most ek/6, since the sampling in each stage is performed independently of the previous stages. Note that the decision of whether or not to put any given bit in S is independent of the other bits, by stages of the sampling we only change the order in which we analyze this decision. Further, in each stage of sampling, by Lemma 23, |Sj|>k with probability at most ek/6, thus the probability that any of Sj has size more than k for j<k/6 is at most k6ek/6 by union bound. Therefore, with probability at least 1(k6+1)ek/6, Ak/6=Ak/61 and |lk/61Sl|k6k (since |Sj|k for all jk/61). Now we sample the remaining indices of S, and by Lemma 23, |S|>2pN with probability at most ek/6. So now we assume that Ak/6=Ak/61, |lk/61Sl|k6k and |S|2pN, which happens with probability at least 1(k6+2)ek/6. For convenience, we now set j=k/61. Define y{0,1}N as follows:

yi={yiif iSxiotherwise

Therefore y|S=y|S and f(y)=fρ(y|S). Let K=h(xAj)S. Note that KljSl, because Sj was sampled from bits in h(xAj)lj1h(xAl) and the bits of h(xAj) in lj1h(xAl) were sampled in lj1Sl. So |K|k6k because we assume |ljSl|k6k. From Lemma 23, |f~(xAj)f~(z)|1/12 for all z which differ from xAj only on indices in SK. In particular, y differs from xAj only on bits in SK, because xAj agrees with y on all bits in ljSl and K is a subset of these bits. Therefore |f~(xAj)f~(y)|1/12. Therefore, if f(y)=1, then f~(z)>1/2 for all z which differ from xAj only on bits in SK. Thus, (K,xAj) is a 1-certificate of size at most k26 for f~ when bits outside of S are fixed to those of x. Finally, y|S is consistent with this certificate, thus g(y|S)=1. Since our assumption holds true with probability at least 1(k6+2)ek/6, therefore, for every x,y{0,1}N, the statement in the theorem holds true. The argument above from Theorem 24 actually shows that a given 1-input of f is consistent with a small 1-certificate (of size k2) with high probability over the choice of unrestricted bits. A symmetric argument can be made for 0-certificates as well. Thus the theorem can be restated as follows:

Corollary 25.

Let f:{0,1}N{0,1,} be a partial function which is approximated to error 1/3 by a real-valued function f~ of fractional block sensitivity F. Pick a set S[N] where each index i[N] is put in S independently with probability p(k48FN)4/7, where k. Then

x,y{0,1}NPrS[Cy|S(fρ)k2]1(2+k6)ek/6

where y|S is the string y restricted to indices in S, ρ is a p-random restriction defined from x and S as follows:

ρ(i)={if iSxiotherwise

In particular, since we now have small certificates (with high probability) for the restricted function, we can construct a decision tree of small-depth which is correct with high probability over the choice of random restriction and a uniformly random input. We show this in Corollary 26 (proof in the full version).

Corollary 26.

Let f:{0,1}N{0,1,} be a partial function which is approximated to error 1/3 by a real-valued function f~ of fractional block sensitivity F. Pick a set S[N] where each index i[N] is put in S independently with probability p(k48FN)4/7, where k. Then

x,y{0,1}NPrS[g(y|S)=fρ(y|S)]1(2+k6)ek/6

where y|S is the string y restricted to indices in S, ρ is a p-random restriction defined from x and S as follows:

ρ(i)={if iSxiotherwise

and g is a decision tree of depth-k4 (described explicitly in the proof), dependent on ρ. Note that if fρ(y|S)=, then g is allowed to output 0 or 1 arbitrarily.

As a corollary of Theorem 24, we can also conclude that functions which have low quantum query complexity also become close to a decision tree of small depth, after applying a random restriction (using Lemma 18).

Theorem 27.

Theorem 24 also holds for partial functions f:{0,1}N{0,1,} of quantum query complexity T, when we set p(k48π2T2N)4/7 to obtain a width-k2 DNF.

Theorem 28.

Corollary 26 also holds for partial functions f:{0,1}N{0,1,} of quantum query complexity T, when we set p(k48π2T2N)4/7 to obtain a depth-k4 decision tree.

We discuss in the full version why our proof for the switching lemma from Theorem 27 does not directly extend to 𝖰𝖬𝖠-query algorithms, unlike Theorem 65 of [1].

5 Random projections for quantum query algorithms

We now consider the effect of random projections on quantum query algorithms, and we start by showing the following theorem on block random restrictions (where we restrict an entire block of bits at a time, instead of restricting individual bits) for quantum query algorithms. This proof is essentially the same as Theorem 24, see full version for details.

Theorem 29.

Let f:{0,1}N×l{0,1,} be a partial function computable by a quantum query algorithm making T queries to the input. Pick a set S[N] where each index i[N] is put in S independently with probability p=(k1922T2N)4/7, where k. Then

x,y{0,1}N×lPrS[g(y|S)=fρ(y|S)]1(2+k6)ek/6

where y|S is the string y restricted to blocks in S, ρ is a p-block-random restriction defined from x and S as follows:

ρ(i,j)={if iSxijotherwise

and g is a width-lk2 DNF dependent on ρ (defined explicitly in the proof). Note that if fρ(y|S)=, then g is allowed to output 0 or 1 arbitrarily.

We now state in Lemma 30 an average case version of Theorem 29, with respect to two (potentially distinct) block-product distributions from which the strings x and y are sampled (proof in full version). An analogous proof for random restrictions where both the distributions are uniform will recover Corollary 4. Note below that {1/2,11/2}l{1}l denotes the product distribution where each bit is set independently to or 1 with probability 1/2, conditioned on not setting every bit to 1.

Lemma 30.

Let D1=i=1ND1i and D2=i=1ND2i be two block-product distributions on {0,1}N×l. Let ρ be a p-block-random restriction with underlying distribution D1, where each block is sampled from {1/2,11/2}l{1}l with probability p and from the corresponding block of D1 otherwise. Let D be the distribution induced by D2 on the indices set to by ρ. Let f:{0,1}N×l{0,1,} be a partial function computable by a quantum query algorithm making T queries to the input. Set p=(k1922T2N)4/7, then there exists a DNF g of width-lk2 such that

𝔼ρ[PrzD[fρ(z)g(z)]](k6+2)ek/6

where g is allowed to answer 0 or 1 arbitrarily if fρ(z)=.

In particular, on applying 𝗉𝗋𝗈𝗃ρ𝗂𝗇𝗂𝗍 to a quantum query algorithm, the resulting function is close to a DNF of width k2.

Corollary 31.

Let D1=i=1ND1i be a block-product distribution on {0,1}N×l and D2=i=1ND2i be a product distribution on {0,1}N. Let ρ be a p-block-random restriction with underlying distribution D1, where each block is sampled from {1/2,11/2}l{1}l with probability p and from the corresponding block of D1 otherwise. Let D be the distribution induced by D2 on the indices set to by 𝗉𝗋𝗈𝗃ρ. Let f:{0,1}N×l{0,1,} be a partial function computable by a quantum query algorithm making T queries to the input. Set p=(k1922T2N)4/7, then there exists a DNF g of width-k2 such that

𝔼ρ[PrzD[𝗉𝗋𝗈𝗃ρ(f)(z)g(z)]](k6+2)ek/6

where g is allowed to answer 0 or 1 arbitrarily if 𝗉𝗋𝗈𝗃ρ(f)(z)=.

Proof.

Consider the DNF h of width-lk2 obtained from Lemma 30, and define g to be the DNF of width-k2 obtained as a projection of h, i.e., g=𝗉𝗋𝗈𝗃(h), so every variable in h from a given block is mapped to the same variable in g. Then the claim follows from Lemma 30.

5.1 Random projections for 𝗔𝗖𝟎 circuits

In our proof we rely heavily on the notation as well as results from [13], therefore in this section we mention all the notation and results we use. Note that we reference the arXiv version of the paper in the rest of the discussion.

Definition 32 (𝖲𝖨𝖯𝖲𝖤𝖱d functions [13]).

The 𝖲𝖨𝖯𝖲𝖤𝖱d function is a depth d read-once monotone formula with n variables, with alternating layers of AND and OR gates. The bottom layer (adjacent to input variables) consists of AND gates, so the root is an OR gate if d is even and AND gate otherwise. All the gates at a particular depth have the same fan-in, which is denoted by wi for gates of depth i. So the bottom fan-in is wd1 and the top fan-in is w0. The parameters tk correspond to the bias of the distribution for the random projection for depth k.

wd1 :=m
w :=m2m/loge
q :=p=2m/2=Θ(logww)
wi :=w for 1kd2
w0 :=mini{(1t1)qi12}=2mln(2)(1±om(1))
λ :=(logw)3/2w5/4
td1 :=pλq=Θ(logww)
tk1 :=(1tk)qwλq=Θ(logww) for 2kd1
n =1±om(1)loge(m2mloge)d1

Next we describe the addressing scheme used for the gates and input variables of 𝖲𝖨𝖯𝖲𝖤𝖱d (which we will also use for 𝖲𝖨𝖯𝖲𝖤𝖱d). Let A0={𝗈𝗎𝗍𝗉𝗎𝗍}, and for 1kd, let Ak=Ak1×[wk1]. An element of Ak specifies the address of a gate at depth k; so Ad={𝗈𝗎𝗍𝗉𝗎𝗍}×[w0]××[wd1] is the set of addresses of the input variables. For some string τ{0,1,}Ak={0,1,}Ak1×[wk1], use τa for aAk1 to denote the ath block of τ of length wk1.

The symbol {0p,11p}n denotes the random bit string of length n where each bit is sampled independently, and is 0 with probability p and 1 with probability 1p. The symbol {0p,11p}n{x} denotes the product distribution conditioned on not outputting the string x. For our setting, x will be either 0n or 1n. The notation x=a±b is shorthand for x[ab,a+b].

We modify the first random projection from that of [13], to replace the bottom layer of quantum circuits in 𝖰𝖠𝖢0 circuits by DNFs. Our subsequent random projections are the same as that of [13]. Since the random projections are defined adaptively based on the outcome of the previous random projection, they define the notion of lift of a random restriction, which tells us the value taken by each gate in the bottom layer of the circuit to which the corresponding random projection has been applied, and this is used to decide how to sample the next random restriction.

Definition 33 (Lift of a restriction, Definition 7 from [13]).

Let 2kd and τ{0,1,}Ak. Assume that the gates of 𝖲𝖨𝖯𝖲𝖤𝖱d (or 𝖲𝖨𝖯𝖲𝖤𝖱d) at depth k1 are gates (otherwise the roles of 0 and 1 below are reversed). The lift of τ is the string τ^{0,1,}Ak1 defined as follows: for each aAk1, the coordinate τ^a of τ^ is

τ^a={0if τa,i=0 for any i[wk1]1if τa={1}wk1if τa{,1}wk1{1wk1}.

To show that the 𝖲𝖨𝖯𝖲𝖤𝖱d function retains structure after applying a random projection, they define the notion of a typical restriction. Roughly, on applying a random projection corresponding to a typical restriction to 𝖲𝖨𝖯𝖲𝖤𝖱d, the depth of the formula is reduced by 1, and the bottom layer of gates of 𝖲𝖨𝖯𝖲𝖤𝖱d takes on values in {0,1,} such that the bottom fan-in of the projected formula is approximately qw. In addition, the fan-in of the layers above remains roughly the same. More generally, after applying a series of random projections to 𝖲𝖨𝖯𝖲𝖤𝖱d, all of which correspond to typical restrictions, applying the next random projection corresponding to a typical restriction ensures that the resulting formula has depth reduced by 1, bottom fan-in approximately qw and fan-ins of all other layers remain roughly the same. [13] show that each of their random restrictions is typical with high probability, assuming that the previous restriction is typical. We will show that the first random restriction which we redefine, is also typical with high probability, and therefore all the subsequent random restrictions remain typical using the results of [13]. Note that in the discussion above and the following definition, we talk about projections corresponding to restrictions which are typical, [13] however, talk about projections corresponding to restrictions for which the lift is typical. So though Definition 34 is identical to theirs, it is stated slightly differently here.

Definition 34 (Typical random restriction, Definition 14 of [13]).

Let τ{0,1,}Ak where 3kd. The restriction τ is typical if

  1. 1.

    (Bottom fan-in after projection qw) For all aAk2

    |τ^a1()|=qw±wβ(k1,d)whereβ(k,d):=13+dk112d
  2. 2.

    (Preserves rest of the structure) For all aAk3

    wk3w4/5|(τ^^a)1()|wk3

These conditions also imply that all the gates in Ak3 remain undetermined. This is because suppose τ is applied to a layer of gates. Then by condition 1, the only values that gates in Ak2 can get are or 1 (so gates in Ak3 have inputs from and 1). By condition 2, gates in Ak3 have at least one input, and since none of their inputs is 0, they remain undetermined.

Finally, [13] show that if an OR function (or its restriction) is close to unbiased under some input distribution where each bit of the input is independent and identically distributed, then it has a small correlation with CNFs of small width under this distribution.

Lemma 35 (Proposition 11.1 of [13]).

Let C:{0,1}n{0,1} be a CNF of width-r and τ{0,1,}n. Let 𝖮𝖱 be the function on n bits and 𝐘{01p,1p}n for p[0,1], then

Pr𝐘[(𝖮𝖱τ(𝐘)C(𝐘)]𝖻𝗂𝖺𝗌(𝖮𝖱τ,𝐘)rp

In order to get an average case depth-hierarchy theorem for 𝖠𝖢0 circuits, [13] show three properties for their sequence of random projections:

  1. 1.

    The sequence of random projections completes to the uniform distribution.

  2. 2.

    The target function 𝖲𝖨𝖯𝖲𝖤𝖱d remains “hard” to compute after applying the sequence of random projections.

  3. 3.

    The approximating circuit C trying to compute 𝖲𝖨𝖯𝖲𝖤𝖱d “simplifies” greatly after applying the sequence of random projections.

We first re-establish Property 1 after modifying the initial random projection to be applied. Then we establish Property 2 for our modified 𝖲𝖨𝖯𝖲𝖤𝖱d function, by showing that the effect of the modified initial random projection on 𝖲𝖨𝖯𝖲𝖤𝖱d is roughly the same as the effect of the initial random projection of [13] on 𝖲𝖨𝖯𝖲𝖤𝖱d. Since our subsequent random projections are the same as theirs, we can conclude that 𝖲𝖨𝖯𝖲𝖤𝖱d retains structure under this sequence of random projections. Finally, we use our random projection result for quantum query algorithm to show that the bottom layer of a 𝖰𝖠𝖢0 circuit simplifies after applying the initial random projection. Then we can use the results of [13] to conclude that the resulting 𝖠𝖢0 circuit also simplifies after applying the subsequent random projections.

5.2 Random projections for 𝗤𝗔𝗖𝟎 circuits

We start by showing an analogue of [10] for 𝖰𝖢𝖯𝖧 in Proposition 36 (proof in full version), to use a circuit lower bound for 𝖰𝖠𝖢0 to get oracle separation result for 𝖰𝖢𝖯𝖧. Note that [1] show a similar analogue for the 𝖰𝖬𝖠-hierarchy.

Proposition 36 (Analogue of [10]).

Let L{0} be some language decided by a 𝖰𝖢Σi verifier V with oracle access to O, and which has size at most p(n) for inputs of length n. Then for every n, there is a circuit C of size at most 2poly(n) and depth i+1, where each layer upto depth i is a layer of AND or OR gates, and depth i+1 contains p(n) sized quantum circuits with query complexity at most p(n), such that x{0,1}n

VO(x)=C(O[p(n)])

where O[p(n)] denotes the concatenation of bits of O on all strings of length at most p(n).

We now describe the modification of 𝖲𝖨𝖯𝖲𝖤𝖱d function (used in [13]) that we show the 𝖰𝖠𝖢0 lower bound for. The modified function which we use will be called 𝖲𝖨𝖯𝖲𝖤𝖱d and is exactly the same as 𝖲𝖨𝖯𝖲𝖤𝖱d defined previously, except for the bottom two fan-ins. The number of variables for 𝖲𝖨𝖯𝖲𝖤𝖱d will be denoted N.

w :=m2m/loge
wd2 :=qwN5/7
q :=1/N5/7
x :=1wd2w1/4
p1 :=x+q(pλq)
wd1 :=log2(p1)=polylog(N)
N =qq.nmlog2(1p1)

Note that N and n are polynomially related. We also redefine the first random projection that is applied to 𝖲𝖨𝖯𝖲𝖤𝖱d, in order to ensure that 𝖲𝖨𝖯𝖲𝖤𝖱d retains structure, while the quantum query algorithms become simple enough to be replaced by DNFs.

Definition 37 (Restriction for initial random projection).

The underlying random restriction ρ𝗂𝗇𝗂𝗍 on {0,1,}Ad1×[wd1] is defined as follows: independently for each aAd1

ρ(a){{1}wd1with probability x{1/2,11/2}wd1{1wd1}with probability q{01/2,11/2}wd1{1wd1}with probability 1xq

We will call this distribution on random restrictions as 𝗂𝗇𝗂𝗍.

We first show in Lemma 38 (proof in full version) that the projection defined using ρ𝗂𝗇𝗂𝗍𝗂𝗇𝗂𝗍 on composition with the subsequent random projections of [13] completes to the uniform distribution on {0,1}N.

Lemma 38 (Analogue of Lemma 8.2 of [13]).

Let ρ𝗂𝗇𝗂𝗍 and 𝐘{01td1,1td1}Ad1, and let the string 𝐗{0,1}N be defined as follows:

𝐗a,i={𝐘aif ρa,i=ρa,iotherwise

Then each bit of 𝐗 is independent and distributed uniformly at random.

The subsequent (d2) random projections applied to the 𝖲𝖨𝖯𝖲𝖤𝖱d function are the same as those of [13] (described formally in the full version). It was shown (in the proof of Proposition 8.4) in [13] that the subsequent (d-2) random projections after ρ𝗂𝗇𝗂𝗍 (along with a suitably chosen distribution for the unrestricted variables) complete to the distribution {01td1,1td1}(ρ𝗂𝗇𝗂𝗍^)1(). We state this formally in the full version. Therefore, we can conclude that the overall random projection completes to the uniform distribution, using Lemma 38.

For ease of writing, [13] use 𝚿(f) as notation for the resulting function after applying all the random projections sampled to a function f. We define this formally in the full version.

Corollary 39 (Proposition 8.1 of [13]).

Consider C:{0,1}N{0,1} be a circuit computing 𝖲𝖨𝖯𝖲𝖤𝖱d defined on N variables. Let 𝐗{01/2,11/2}N and 𝐘{01t1,1t1}w0 (we assume that d is even). If 𝚿 is sampled according to the sequence of projections,

Pr𝐗[𝖲𝖨𝖯𝖲𝖤𝖱d(𝐗)C(𝐗)]=Pr𝚿,𝐘[(𝚿(𝖲𝖨𝖯𝖲𝖤𝖱d))(𝐘)(𝚿(C))(𝐘)]

We now show that ρ𝗂𝗇𝗂𝗍 as sampled above, results in a typical ρ with high probability, in Lemma 40 and Lemma 41 (proof in full version).

Lemma 40 (Analogue of Lemma 10.3 of [13]).

Let τ=ρ^ so that τ{0,1,}Ad1. Then

Pr[|τα1()|=qw±w1/3] 1eΩ~(w1/6) αAd2
Lemma 41 (Analogue of Lemma 10.4 of [13]).

Let τ=ρ^ so that τ{0,1,}Ad1. Then

Pr[ww4/5|τ^α1()|w] 1eΩ(w4/5) αAd3
Lemma 42 (Lemma 10.6 and Lemma 10.8 of [13]).

For 2kd1, let τ{0,1,}Ak+1 be typical. Then ρ(τ^) is typical with probability at least 1eΩ~(w1/6).

On application of the overall sequence of random projections 𝚿, the function 𝖲𝖨𝖯𝖲𝖤𝖱d becomes an OR of fan-in at most w0 if d is even, and an AND of fan-in at most w0 otherwise. We will now assume that 𝖲𝖨𝖯𝖲𝖤𝖱d becomes an OR of fan-in at most w0, the case for AND is analogous. Then we sample the first (d-2) restrictions and assume they are all typical, which happens with probability at least 1deΩ~(w1/6) using Lemma 40, Lemma 41 and Lemma 42 and a union bound. We know from Definition 34 that the top OR gate of the function after applying the first (d-2) projections is undetermined and has fan-in at least w0w4/5. [13] then show that in expectation over the final random projection, the bias of the projected function is close to 1/2 under the distribution {01t1,1t1}w0.

Lemma 43 (Proposition 10.13 of [13]).

Let 𝐘{01t1,1t1}w0 and 𝚿(𝖲𝖨𝖯𝖲𝖤𝖱d) be the random projection of 𝖲𝖨𝖯𝖲𝖤𝖱d when 𝚿 is sampled according to the sequence of projections. Then

𝔼𝚿[𝖻𝗂𝖺𝗌(𝚿(𝖲𝖨𝖯𝖲𝖤𝖱d),𝐘)]12O~(w1/12)

Now we state and use the projection switching lemma of [13] (statement taken from their paper), to obtain our final circuit lower bound.

Theorem 44 (Proposition 9.2 of [13]).

Let 2kd1 and F:{0,1}Ak{0,1} be a DNF/CNF of width r. Then for all τ{0,1,}Ak and s1,

Prρ(τ)[𝖣𝖳(𝗉𝗋𝗈𝗃ρF)>s]=(O(rertk/(1tk)w1/4))s

We want to use Theorem 44 (d-2) times on the circuit obtained after applying the first random projection and replacing the layer of quantum circuits in a 𝖰𝖠𝖢d20 circuit with a small width DNF/CNF as per Corollary 31, to conclude that this circuit becomes a decision tree with high probability. Assuming that ρ𝗂𝗇𝗂𝗍=τ, we sample d2 random restrictions same as that of [13], and use 𝚿τ to denote the composition of the corresponding random projections. We define this formally in the full version.

Corollary 45 (Analogue of Proposition 9.13 of [13]).

For any constant d2, let C:{0,1}Ad1{0,1} be a depth-d circuit which has (or ) as output gate, with bottom fan-in w1/5 and size S2w1/5. Then for any τ{0,1,}Ad

Pr𝚿τ[𝚿τ(C) is a CNF (or DNF) of width at most w1/5]1eΩ(w1/5)

Proof.

Apply Theorem 44 (d-2) times on C, with r=s=w1/5, along with a union bound on the number of gates in the bottom layer (at most S) for each application of Theorem 44. Finally, we can combine all the above ingredients to show the desired circuit lower bound for 𝖰𝖠𝖢0 circuits in Theorem 46 (see full version for proof).

Theorem 46.

For any even constant d4, let 𝖲𝖨𝖯𝖲𝖤𝖱d be defined on N variables. Let C:{0,1}N{0,1} be any 𝖰𝖠𝖢d20 circuit of size S=𝗊𝗎𝖺𝗌𝗂𝗉𝗈𝗅𝗒(N)<2N16(d1), which has as its output gate. Then for a uniformly random input 𝐗, we have

Pr𝐗[𝖲𝖨𝖯𝖲𝖤𝖱d(𝐗)C(𝐗)]121NΩ(1/d)

An analogous proof follows for the case when d3 is odd, by replacing 𝖮𝖱 with 𝖠𝖭𝖣 (and with ) and flipping the values 0 and 1 in the relevant distributions. Note also that the function 𝖲𝖨𝖯𝖲𝖤𝖱d has bottom fan-in polylog(N). Therefore, using Theorem 46, for every d2, we can conclude that relative to a random oracle O, ΣdO𝖰𝖢Πd1O when d is odd, and ΠdO𝖰𝖢Σd1O when d is even. In particular, for every d2 and relative to a random oracle O, we have that ΣdO𝖰𝖢Πd1O. In addition, we can conclude that relative to a random oracle, 𝖰𝖢𝖯𝖧 is infinite (proof in full version).

Theorem 47.

With probability 1, a random oracle O satisfies ΣdO𝖰𝖢Πd1O for odd d3.

Corollary 48.

With probability 1, a random oracle O satisfies ΠdO𝖰𝖢Σd1O for even d2.

Proof.

Similar to Theorem 47, using the analogue of Theorem 46 for 𝖲𝖨𝖯𝖲𝖤𝖱d when d is odd.

Corollary 49.

With probability 1, a random oracle O satisfies 𝖰𝖢Σd+1O𝖰𝖢ΠdO for all d1, and therefore 𝖰𝖢𝖯𝖧 is infinite relative to O.

Proof.

Follows from Theorem 47, Corollary 48 and collapse theorem for 𝖰𝖢𝖯𝖧 [3]. We discuss in the full version how our switching lemma for quantum query algorithms (Theorem 27) compares to the switching lemma (Theorem 65) from [1], and why the latter is not sufficient to establish our oracle separation result.

6 Oracle Separation for 𝗤𝗖𝗠𝗔 Hierarchy

In this section, we show that there exists an oracle such that the 𝖰𝖢𝖬𝖠 hierarchy (called 𝖰𝖢𝖬𝖠𝖧) is infinite, and no fixed level of 𝖰𝖢𝖬𝖠 hierarchy contains all of 𝖯𝖧. We first redefine the 𝖲𝖨𝖯𝖲𝖤𝖱 function, which we will call 𝖲𝖨𝖯𝖲𝖤𝖱′′. The function 𝖲𝖨𝖯𝖲𝖤𝖱d′′ will be an AND-OR tree of depth 2d+2 (with AND gates at the bottom), and we will show that it can not be computed by a circuit corresponding to a 𝖰𝖢𝖬𝖠𝖧d machine. The exact parameters for 𝖲𝖨𝖯𝖲𝖤𝖱d′′ and some properties of the function are given in the full version.

We will apply 2d1 random projections. For 1id, we sample random restrictions ρi,1{0,1,}A2i+2 which acts on quantum query algorithms and ρi,2{0,1,}A2i+1 which acts on DNFs. These are defined formally and some properties of the projections are shown in the full version.

A proof very similar to Proposition 8.4 of [13] (stated in full version) and Lemma 38 can then be used to conclude that this sequence of random projections sampled completes to the uniform distribution. More strongly, the proof of Proposition 8.4 of [13] tells us that for 1id projections upto ρi,2 complete to the t2i+1 biased product distribution and projections upto ρi,1 complete to the t2i+2 biased product distribution. This fact will be useful later to be able to apply the projection switching lemma for quantum algorithms repeatedly.

In order to show that the 𝖲𝖨𝖯𝖲𝖤𝖱d′′ formula retains structure after applying each random projection, we modify the definition of typical restrictions (Definition 34) slightly, to account for the new parameters. We state the formal definition in the full version. We then need to show that the sequence of projections applied is typical. It follows from Lemma 40 and Lemma 41 that ρd,1 is typical with probability at least 1eΩ(w1/6). It follows from Lemma 42 that ρi,2 is typical, for 1id (note that we have changed the fan-ins of the function and hence the definition of typical restriction, but the proof for Lemma 42 works to establish this as well). We then show (proof in the full version) that ρi,1 is typical with high probability, for 1id1. This proof is very similar to that of Lemma 42 and can be adapted to show typicality for ρi,2 as well.

Finally, we show an analogue of [10] for the 𝖰𝖢𝖬𝖠 hierarchy in Proposition 50 (proof in full version), to use a circuit lower bound to get an oracle separation result.

Proposition 50 (Analogue of [10]).

Let L{0} be some language decided by a 𝖰𝖢𝖬𝖠𝖧i verifier V=V1,,Vi with oracle access to O, and where each Vj has size at most p(n) for inputs of length n. Let qi(n)=p(p(p(n)) where p is composed i times. Then for every n, there is a circuit C of size at most 2poly(n) and depth 2i, where the top layer is an OR gate and the layers alternate between OR gates of width 2qi(n), and quantum circuits of size qi(n) with query complexity at most qi(n) such that x{0,1}n

VO(x)=C(O[qi(n)])

where O[qi(n)] denotes the concatenation of bits of O on all strings of length at most qi(n).

The proof for the depth-hierarchy theorem now follows very similarly to the proof for 𝖰𝖢𝖯𝖧 (or 𝖰𝖠𝖢0 circuits) in Theorem 46. We noted earlier that the sequence of random projections completes to an appropriately biased product distribution at every level of the AND-OR tree, hence we can apply the projection switching lemma for quantum algorithms which works for arbitrary inner and outer product distributions. This switching lemma and the projection switching lemma for DNFs Theorem 44 by [13] are applied in an alternating manner to get a depth-hierarchy theorem (the projection switching lemma for quantum algorithms is applied for ρi,1 and the projection switching lemma for DNFs is applied for ρi,2). Then oracle separations analogous to the case for 𝖰𝖢𝖯𝖧 follow for 𝖰𝖢𝖬𝖠𝖧 using a standard diagonalization argument. Note however that now we will get an oracle separation between Σ2d+1 and 𝖰𝖢𝖬𝖠𝖧d relative to a random oracle O (whereas for 𝖰𝖢𝖯𝖧 we got an oracle separation between Σd+1 and 𝖰𝖢Πd).

References

  • [1] Scott Aaronson, DeVon Ingram, and William Kretschmer. The acrobatics of BQP. In 37th Computational Complexity Conference, CCC, 2022. doi:10.4230/LIPIcs.CCC.2022.20.
  • [2] Avantika Agarwal and Shalev Ben-David. Oracle separations for the quantum-classical polynomial hierarchy. CoRR, abs/2410.19062, 2024. doi:10.48550/arXiv.2410.19062.
  • [3] Avantika Agarwal, Sevag Gharibian, Venkata Koppula, and Dorian Rudolph. Quantum Polynomial Hierarchies: Karp-Lipton, Error Reduction, and Lower Bounds. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024), 2024. doi:10.4230/LIPIcs.MFCS.2024.7.
  • [4] Anurag Anshu, Shalev Ben-David, and Srijita Kundu. On query-to-communication lifting for adversary bounds. In 36th Computational Complexity Conference, CCC, 2021. doi:10.4230/LIPIcs.CCC.2021.30.
  • [5] Robert Beals, Harry Buhrman, Richard Cleve, Michele Mosca, and Ronald de Wolf. Quantum lower bounds by polynomials. J. ACM, 48(4):778–797, 2001. doi:10.1145/502090.502097.
  • [6] Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh Vazirani. Strengths and weaknesses of quantum computing. SIAM Journal on Computing, 26(5):1510–1523, October 1997. doi:10.1137/s0097539796300933.
  • [7] Sreejata Kishor Bhattacharya. Aaronson-ambainis conjecture is true for random restrictions, 2024. doi:10.48550/arXiv.2402.13952.
  • [8] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theor. Comput. Sci., 288(1):21–43, 2002. doi:10.1016/S0304-3975(01)00144-X.
  • [9] Yuval Filmus, Hamed Hatami, Nathan Keller, and Noam Lifshitz. On the sum of the l1 influences of bounded functions, 2015. arXiv:1404.3396.
  • [10] Merrick Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical systems theory, 17(1):13–27, 1984. doi:10.1007/BF01744431.
  • [11] Sevag Gharibian, Miklos Santha, Jamie Sikora, Aarthi Sundaram, and Justin Yirka. Quantum generalizations of the polynomial hierarchy with applications to QMA(2). Comput. Complex., 31(2):13, 2022. doi:10.1007/S00037-022-00231-8.
  • [12] Johan Håstad. Almost optimal lower bounds for small depth circuits. In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pages 6–20. ACM, 1986. doi:10.1145/12130.12132.
  • [13] Johan Håstad, Benjamin Rossman, Rocco A. Servedio, and Li-Yang Tan. An average-case depth hierarchy theorem for boolean circuits. J. ACM, 64(5):35:1–35:27, 2017. doi:10.1145/3095799.
  • [14] Avishay Tal. Properties and applications of boolean function composition. In Innovations in Theoretical Computer Science, ITCS, pages 441–454. ACM, 2013. doi:10.1145/2422436.2422485.