Abstract 1 Introduction 2 Preliminaries 3 Bootstrapping weak parity gates 4 Quantum MOD gates are powerful even on their own References Appendix A Deferred Proofs

Quantum Threshold Is Powerful

Daniel Grier ORCID University of California, San Diego, La Jolla, CA, USA Jackson Morris ORCID University of California, San Diego, La Jolla, CA, USA
Abstract

In 2005, Høyer and Špalek showed that constant-depth quantum circuits augmented with multi-qubit Fanout gates are quite powerful, able to compute a wide variety of Boolean functions as well as the quantum Fourier transform. They also asked what other multi-qubit gates could rival Fanout in terms of computational power, and suggested that the quantum Threshold gate might be one such candidate. Threshold is the gate that indicates if the Hamming weight of a classical basis state input is greater than some target value.

We prove that Threshold is indeed powerful – there are polynomial-size constant-depth quantum circuits with Threshold gates that compute Fanout to high fidelity. Our proof is a generalization of a proof by Rosenthal that exponential-size constant-depth circuits with generalized Toffoli gates can compute Fanout. Our construction reveals that other quantum gates able to “weakly approximate” Parity can also be used as substitutes for Fanout.

Keywords and phrases:
Shallow Quantum Circuits, Circuit Complexity, Threshold Circuits
Copyright and License:
[Uncaptioned image] © Daniel Grier and Jackson Morris; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Quantum complexity theory
; Theory of computation Circuit complexity
Related Version:
Previous Version: https://arxiv.org/abs/2411.04953
Acknowledgements:
JM thanks Farzan Byramji for helpful discussions about threshold circuits. We also thank anonymous reviewers for many useful comments. Part of this research was performed while the authors were visiting the Institute for Mathematical and Statistical Innovation (IMSI), which is supported by the National Science Foundation (Grant No. DMS-1929348).
Funding:
Part of this research was performed while the authors were visiting the Institute for Mathematical and Statistical Innovation (IMSI), which is supported by the National Science Foundation (Grant No. DMS-1929348).
Editors:
Srikanth Srinivasan

1 Introduction

To what extent are large multi-qubit gates useful for quantum computation? On the one hand, it is well-known that every multi-qubit gate can be decomposed into a circuit of simpler 1- and 2-qubit gates. On the other hand, this decomposition may introduce large overheads both in terms of gate count and circuit depth. Given that some multi-qubit gates might be experimentally feasible [22, 14, 16], it’s natural to ask what kinds of computational powers they unlock.

Specifically, we focus on the power of these large multi-qubit gates in constant depth. Such shallow circuits are experimentally appealing due to the possibility for less decoherence. Moreover, even shallow quantum circuits with 1- and 2-qubit gates are known to be surprisingly powerful, exhibiting quantum advantage in a variety of settings [4, 9, 25, 11]. Given the inherent complexity of simulating such circuits, there is the exciting possibility that augmenting these circuit models with large multi-qubit gates might lead to constant-depth implementations of practical quantum algorithms.

Much of the excitement about such circuit models is driven by a single gate – the multi-qubit Fanout gate – which is the quantum operation that copies classical information:

𝖥n|b,x1,,xn:=|b,x1b,,xnb

for all b,x1,,xn{0,1}.

This seemingly innocuous gate (the classical analogue of which is included for free in almost every classical circuit model) turns out to be quite powerful. For starters, it is locally equivalent via conjugation by Hadamard gates to the quantum Parity gate [17],

𝖯n|b,x1,,xn:=|b(x1xn),x1,,xn,

which is a duality that has no classical counterpart [1]. Moreover, there are constant-depth quantum circuits with Fanout (and arbitrary single-qubit gates) for a wide variety of other symmetric Boolean operations such as And/Or and Majority [13, 24]. Perhaps most impressively, constant-depth quantum circuits with Fanout gates can factor integers with polynomial-time classical post-processing [13].

Given the centrality of Fanout to the story of low-depth circuits with multi-qubit gates, there has been significant work in trying to understand if other multi-qubit gates are similarly powerful. Most notably, it is widely believed that the multi-qubit generalization of the Toffoli gate is fundamentally less powerful than the Fanout gate in constant depth, and there is long line of work giving evidence that these generalized Toffoli gates cannot compute Fanout [3, 6, 21, 19, 18, 2]. In some sense, all of these results are grappling with a fundamental tension in the study of these low-depth circuit models – the high entanglement in the states produced by these circuits is an obstacle to proving lower bounds, but it is simultaneously unclear how one could leverage this complexity to implement a useful quantum algorithm.

There is a surprising dearth of low-depth circuit models with multi-qubit gates that are as powerful as Fanout. One natural111This candidate looks considerably more natural after considering the analogous landscape of classical circuits, which we discuss in Section 1.1. candidate for a gate that could be as powerful as Fanout is the quantum Threshold gate, a multi-qubit gate parameterized by some value k:

𝖳𝗁n,k|b,x1,,xn:=|b𝕀|x|k,x1,,xn

where 𝕀|x|k indicates if the Hamming weight of the input bit string x=x1xn{0,1}n is at least the target value k. In fact, Høyer and Špalek asked almost 20 years ago about the power of Threshold in constant depth [13]: “Can we simulate unbounded fan-out in constant depth using unbounded fan-in gates, e.g. threshold[t] or exact[t]?” This question was reiterated more pointedly by Takahashi and Tani in 2011 [24]: “Does there exist a fundamental gate that is as powerful as an unbounded fan-out gate?”

We directly answer both of these questions in the affirmative by giving explicit constructions for Fanout using quantum Threshold gates:

Theorem 1.

There are poly-size constant-depth quantum circuits consisting of Threshold gates and arbitrary single-qubit gates that compute Fanout with high fidelity. Formally, 𝖡𝖰𝖳𝖢0=𝖡𝖰𝖭𝖢wf0.

The construction from this theorem actually reveals a number of other gates that are as fundamentally powerful as the Fanout gate. As it turns out, the salient feature of Threshold for our purposes is that it can be used to construct a sort of “weak” Parity gate – a gate that only acts non-trivially on inputs of the same parity.

Based on this idea, we introduce a class of multi-qubit phase gates that exhibit a generalization of this behavior. Formally, these gates are defined with respect to a set S{0,1}n in the following way:

US|x1,,xn:=(1)𝕀xS|x1,,xn.

Crucially, we restrict our attention to “parity-restricted” sets S, that is, sets where all elements have the same parity (i.e., x,yS|x||y|(mod2)). We show that these weak parity gates can be bootstrapped in constant depth into true Parity gates (which, recall, are locally-equivalent to Fanout) albeit with the help of a few generalized Toffoli gates:

Theorem 2.

Let {Sn}n be a family of parity-restricted sets with size |Sn|=Θ(2n/𝗉𝗈𝗅𝗒(n)). There are poly-size constant-depth quantum circuits consisting of USn gates, generalized Toffoli gates, and arbitrary single-qubit gates that compute Fanout with high fidelity.

Since it is widely believed that multi-qubit Toffoli gates are not themselves sufficient to implement Fanout, the power of this construction likely derives from the weak parity gates. In fact, the reason these Toffoli gates were not required for Theorem 1 is due to the fact that Threshold can directly simulate Toffoli. In that vein, we also give conditions under which the USn gates alone suffice to simulate Parity; namely, when |Sn|2nO(1) or |Sn|2(1ϵ)n. Though, the latter condition will result in circuits of super-polynomial size.

While it has long been thought that Fanout/Parity gates were morally equivalent to other quantum modular arithmetic gates, those constructions seem to also require these generalized Toffoli gates [8]. By a careful inspection of the original construction presented in [8] we find that generalized Toffoli gates are in fact not necessary. Formally, the quantum Mod-p gates are defined as

𝖬𝖮𝖣n,p|b,x1,,xn:=|b𝖬𝗈𝖽n,p(x),x1,,xn,

where 𝖬𝗈𝖽n,p(x) is 1 when p divides the Hamming weight of x=x1,,xn{0,1}n. For example, the Mod-2 gate is essentially the Parity gate (up to a single-qubit X gate). It is implicit in [8] that Fanout can be computed by a circuit consisting of Mod-p gates and one- and two-qubit gates, yielding 𝖰𝖭𝖢wf0=𝖰𝖭𝖢0[2]𝖰𝖭𝖢0[q] for all q2, but not necessarily that 𝖰𝖭𝖢0[p]=𝖰𝖭𝖢0[q] for distinct p and q. The result they make explicit is that when Toffoli gates are allowed, any Mod-q gate can be obtained using any other Mod-p gate (by first implementing Fanout with Mod-q gates and then computing Mod-p with Fanout and generalized Toffoli gates). Concretely; 𝖰𝖠𝖢0[p]=𝖰𝖠𝖢0[q] for p,q2. Only later was it shown that generalized Toffoli gates can be implemented using Fanout and single- and two-qubit gates, i.e., that 𝖰𝖭𝖢wf0=𝖰𝖠𝖢wf0 [13, 24]. In light of these results we observe the following:

Theorem 3.

For all p,q2, there are poly-size constant-depth quantum circuits consisting of Mod-p gates and single-qubit gates that compute the Mod-q operation. Formally, 𝖰𝖭𝖢0[p]=𝖰𝖭𝖢0[q].

1.1 Comparison to the classical setting

Our focus on shallow circuits draws considerable inspiration from the analogous study of classical constant-depth circuit classes with large fan-in gates, which has been hugely influential in classical complexity theory. For instance, initial work in Boolean circuits saw the development of techniques for proving unconditional lower bounds such as random restrictions [1, 7, 27, 12], Fourier analytic methods [15], and polynomial methods [20, 23].

So how do we compare the quantum and classical settings? And what does this comparison tell us about the power of quantum circuits in constant depth? To start, classical circuits classes (e.g., 𝖭𝖢0, 𝖠𝖢0, 𝖳𝖢0, …) typically assume that the output of any gate can be used as input for any number of other gates (i.e., a gate’s output can be “fanned out” to other gates). Of course, this is exactly the kind of fanout that immediately becomes so powerful when given to a constant-depth quantum circuit.

In fact, because of this fanout, the classical Threshold gate reigns supreme amongst similar classical circuit complexity classes. This is due to the fact that constant-depth classical circuits with Fanout and Threshold can compute any Boolean function where the output depends only on the Hamming weight of the input.222To see this, first notice that for any k, there is a constant-depth circuit with two Threshold gates that computes whether or not the input has Hamming weight exactly k. Since any symmetric Boolean function can be expressed as a disjunction over these “exact-k” clauses, the claim immediately follows due to the fact that a threshold of 1 is equivalent to the Or function. Formally, the complexity class 𝖳𝖢0, which contains all languages computed by constant-depth classical circuits with Threshold, contains all other similarly defined classical circuit classes with other large fan-in gates: 𝖭𝖢0[p], 𝖠𝖢0, 𝖠𝖢0[p], and 𝖠𝖢𝖢.333See Definition 14 for precise definitions. In many cases, Threshold is provably more powerful, e.g., 𝖠𝖢0𝖳𝖢0 [1, 7] and 𝖠𝖢0[p]𝖳𝖢0 [20, 23].

This is why the Threshold gate was a tantalizing target for quantum exploration. Prior to our work, it was not known whether the quantum version of 𝖳𝖢0 – i.e., 𝖡𝖰𝖳𝖢0 – was as powerful as the quantum versions of the other classical complexity classes. In fact, given the surprising power of Fanout in the quantum world, the exact opposite was known: 𝖡𝖰𝖭𝖢wf0𝖡𝖰𝖳𝖢0 [13]. That is, constant-depth quantum circuits with Fanout could simulate constant-depth circuits with Threshold. Our work restores order to the usually classical hierarchy, placing Threshold alongside Fanout as one of the most powerful quantum gates in constant depth: 𝖡𝖰𝖳𝖢0=𝖡𝖰𝖭𝖢wf0.

1.2 Proof techniques and overview

The constructions in Theorem 1 and Theorem 2 follow a general outline pioneered by Rosenthal [21]. There, it is shown that constant-depth quantum circuits can compute Fanout using generalized Toffoli gates provided exponential-sized circuits are allowed. While not phrased in this language, Rosenthal’s construction shows a proof-of-principle technique for taking a very “weak” Parity gate (indeed, Toffoli non-trivially computes Parity for exactly one input!) and boosting it to a full Parity gate. We show that when we start with a gate (like Threshold) which is closer to Parity, this construction can be altered to yield circuits of polynomial size.

The proof goes in two steps. First, define a certain cat-like state called a “nekomata” [21]:

|0n|ψ0+|1n|ψ12

where |ψ0 and |ψ1 are arbitrary states. Following a similar idea to that of Green et al.  [8], such states can be used to compute Parity in constant depth using the relative phase between the |0n and |1n part of the state.

Second, show there is an explicit constant-depth construction for a nekomata state. Here, we show the key ingredient is the ability to create a “noisy” version of a usual cat state, where the all-zeroes and all-ones outcomes have noticeably larger amplitudes than those on the other outcomes. Threshold gates are significantly better at this task than the Toffoli gates in Rosenthal’s original construction. Finally, these states can be combined together (using Toffoli or Threshold gates) to form a high-fidelity nekomata state, completing the construction.

1.3 Related work

Our work shares some similarity to that of [10], where the authors explore quantum advantage with constant-depth quantum circuits. They also make a similar claim suggesting that 𝖰𝖳𝖢0=𝖰𝖭𝖢wf0, but crucially, their results hold in a circuit model with intermediate measurements and classical fanout. The classical fanout in their circuit model allows them to bootstrap the poor man’s cat state construction of Bene Watts et al. [26] to construct an actual cat state, an idea that was also explored in [5]. To be clear, our circuit model and definition of 𝖡𝖰𝖳𝖢0 follows in a traditional line of work (e.g, [17, 8, 13, 24, 19, 21, 18]), where no such intermediate measurements or classical fanout is allowed. Therefore, we must use entirely different techniques.

1.4 Future directions

One immediate question left open by our work is whether the approximation error inherent in the construction used to prove Theorem 1 can be eliminated without incurring a size or depth blow-up. More generally, we ask which other conditions on a family of multi-qubit gates lead to powerful shallow circuits. One explicit approach would be to ask what properties of the sets S parameterizing our phase gates US are sufficient to compute Fanout. Is there a sufficient condition beyond being parity restricted?

Another interesting question concerns the circuit complexity of restricted families of threshold functions. Specifically, consider the Exact-k gate, which indicates if the Hamming weight of the input is exactly k. Notice that Exact-k can be constructed from two Threshold gates. Moreover, for kn/2, Exact-k can be used to compute Threshold. This latter statement is not obvious and follows from the fact that our proof of Theorem 1 actually uses Exact gates rather than Threshold gates. Indeed, for any constant α>0 and any k[nα,nnα] the gates 𝖤𝖷k/2, 𝖳𝗁k and 𝖯n are all equivalent under 𝖰𝖠𝖢0 reductions (see Theorem 22). However, it is unclear if this remains true for k which is sub-polynomial in n, say k=logn.

2 Preliminaries

We will now introduce the different types of entangling gates considered in this work, the types of circuits constructed from them, and the complexity classes to which they roughly correspond.

2.1 Multi-qubit Gates

A simple multi-qubit gate is the 𝖢𝖭𝖮𝖳 gate which acts on two qubits, flipping the target conditioned on the control, i.e.,

𝖢𝖭𝖮𝖳|x1,x2=|x1,x1x2

Any two-qubit gate can be constructed from constantly many single-qubit gates and 𝖢𝖭𝖮𝖳 gates. A circuit consisting entirely of arbitrary single- and two-qubit gates is said to be a 𝖰𝖭𝖢 circuit.

Another multi-qubit gate of interest is the Toffoli gate, which acts on three qubits by flipping the last qubit controlled on the first two, i.e.,

Tof|x,y,z=|x,y,(xy)z

This gate can be seen as a 𝖢𝖭𝖮𝖳 gate with an additional control qubit. In fact, we call the analogous unitary on n>1 qubits a generalized Toffoli gate:

Definition 4.

The generalized Toffoli gate n acts on n+1 qubits by computing the 𝖠𝖭𝖣 of the first n bits in superposition. For all x1,x2,xn,b{0,1} the n-gate acts as

n|x1,x2,xn,b=|x1,x2,xn,(x1xn)b

Circuits composed of arbitrary single-qubit gates and generalized Toffoli gates are referred to as 𝖰𝖠𝖢 circuits.

Definition 5.

For k{0,1n} and x1,x2,xn,b{0,1} the unitary 𝖳𝗁n,k acts as

𝖳𝗁n,k|b|x=|b𝕀|x|k|x

Circuits composed of arbitrary single-qubit gates and threshold gates444Recall that a function f:{0,1}n{0,1} is said to be a threshold function if it can be written as f(x)={1i=1nwixib0otherwise for some w1,wn,b. For our purposes it suffices to only consider threshold functions in which wi=1 for all i[n]. are said to be 𝖰𝖳𝖢 circuits. Note that by taking k=n we recover the generalized Toffoli gate, and in this sense the generalized Toffoli gate is a Threshold gate, so including this gate in the allowed gate-set for 𝖰𝖳𝖢 circuits would be redundant.

Note that in the above definition we allow for the use of any threshold gate in 𝖰𝖳𝖢 circuits, but this is actually not necessary, even in constant depth. As we will later show, it suffices to have a usual threshold, 𝖳𝗁n,k, or exact, 𝖤𝖷n,k, gate with k=𝗉𝗈𝗅𝗒(n) to recover the full computational power of 𝖰𝖳𝖢 as is defined above. In other words, any sufficiently powerful 𝖤𝖷n,k or 𝖳𝗁n,k gate can be leveraged to recover all other threshold and exact gates in constant-depth and polynomial-size. This reduction is non-trivial and combines our main result with the constant-depth constructions given in [13] for computing arbitrary threshold gates with fanout. A full explanation is given in Theorem 22.

Let Uf be the unitary which computes some boolean function f:{0,1}n{0,1} in superposition, i.e., for all x{0,1}n and b{0,1} Uf|x,b=|x,bf(x). Note that all multi-qubit gates discussed thus far fall into this category. Now, observe that when the target qubit is replaced with |=|0|12, we can “compute f in the phase”:

Uf||x=(1)f(x)||x

So, given Uf we can with a single ancilla implement Vf which acts as Vf|x=(1)f(x)|x. While going from Uf to Vf is not a difficult task the converse could in general be quite non-trivial.555For instance, take Zn; this gate computes parity in the phase as Zn|x=(1)|x||x, but it is unclear if there is a simple way to recover the usual parity gate: 𝖯n.

As mentioned, the quantum Fanout gate gives us some way of “copying” a given qubit and 𝖷𝖮𝖱-ing it onto an unbounded number of qubits.

Definition 6.

For all x1,x2,xn,b{0,1} the Fanout unitary, 𝖥n, acts as

𝖥n|b|x1,x2,xn=|bx1,bx2,bxn

We will refer to circuits constructed from one- and two-qubit and Fanout gates as 𝖰𝖭𝖢wf circuits.

Another important class of gates are so-called 𝖬𝖮𝖣 gates:

Definition 7.

For a given m and all x1,x2,xn,b{0,1} the 𝖬𝖮𝖣n,m gate acts as

𝖬𝖮𝖣n,m|x1,x2,xn|b=|x1,x2,xn|𝖬𝗈𝖽n,m(x)b

Where 𝖬𝗈𝖽n,m(x)=1 iff |x| is divisible by m. Further, for {0,1,m1} we use 𝖬𝗈𝖽n,m,(x) to denote the function which is 1 iff |x|(modm) and the corresponding quantum gate accordingly:

𝖬𝖮𝖣n,m,|b|x1,x2,xn=|𝖬𝗈𝖽n,m,(x)b|x1,x2,xn.

Note that when m=2 the 𝖬𝖮𝖣n,2,1 gate is equivalent to the parity gate 𝖯n. When a circuit consists of one- and two-qubit gates and 𝖬𝖮𝖣n,m gates for a fixed m it is called a 𝖰𝖭𝖢[m] circuit and when the circuit also contains generalized Toffoli gates it is referred to as a 𝖰𝖠𝖢[m] circuit.

The final class of gates we will define are what we call “parity-restricted” gates which have not previously appeared in the literature. A set of bit strings S{0,1}n is said to be parity restricted if |s1||s2|(mod2) for all s1,s2S.

Definition 8.

For a given set S{0,1} we define the n-qubit gate US,n as

US,n|x=(1)𝕀S(x)|x

Further, we call US,n a parity-restricted gate if |x||y|mod2 for all x,yS.

We will often drop the n subscript from US,n when discussing sets S which only consist of strings of the same length, i.e. S{0,1}n for some n.

For a given parity-restricted set S, a circuit consisting of arbitrary one- and two-qubit gates and US,n gates for n is said to be a 𝖰𝖭𝖢S circuit. Similarly, if the circuit also consists of generalized Toffoli gates the circuit is said to be a 𝖰𝖠𝖢S circuit. Indeed, the parity-restrict condition can be dropped, but in this work we will only consider S which are parity-restricted sets.

Finally, we will define the primary complexity measures for quantum circuits.

Definition 9.

A quantum circuit C is said to have depth d if C can be decomposed as a sequence MdSdM2S2M1S1 where each Si consists entirely of single-qubit gates and Mi consists of non-overlapping multi-qubit gates (i.e., every pair of gates in Mi operate on disjoint sets of qubits).

Definition 10.

A quantum circuit C has size s if C has exactly s multi-qubit gates.

2.2 Quantum Circuit Complexity Classes

In this section we will define the relevant quantum circuit classes, but before doing that we must introduce the notion of a circuit family and what it means for a circuit family to compute a Boolean function.

Definition 11.

A family of quantum circuits is a collection 𝒞={Cn}n1 where Cn acts on n+a(n) qubits where a(n) is some computable function.

This definition of a circuit family is analogous to the classical notion of a non-uniform circuit family since there need not be any relation between circuits for different sizes (e.g., it is not necessary for there to exist a Turing machine which outputs a description of Cn on input 1n. Such a requirement is only for uniform circuit families). It should be noted that all constructions presented in this work correspond to uniform circuit families nonetheless.

Definition 12.

For a given language L{0,1} we say that a family of quantum circuits {Cn}n1 each acting on n+a(n) qubits exactly computes L if for all n1 and x{0,1}n measuring the last qubit of Cn|x|0a(n) in the computational basis yields

  • |1 with certainty if xL

  • |0 with certainty if xL

Now, for the complexity classes of interest:

  • 𝖰𝖭𝖢i is the class of problems decidable by 𝖰𝖭𝖢 circuits which act on polynomially-many qubits (i.e. n+a(n) is bounded by some polynomial in n), have polynomial size and depth O(logi(n)).

  • 𝖰𝖠𝖢i is the class of problems decidable by 𝖰𝖠𝖢 circuits which act on polynomially-many qubits, have polynomial size and depth O(logi(n)).

  • 𝖰𝖳𝖢i is the class of problems decidable by 𝖰𝖳𝖢 circuits which act on polynomially-many qubits, have polynomial size and depth O(logi(n)).

  • 𝖰𝖭𝖢wfi is the class of problems decidable by 𝖰𝖭𝖢wf circuits which act on polynomially-many qubits, have polynomial size and depth O(logi(n)).

The primary focus of this work will be constant depth circuits, which correspond to i=0 in the above definitions, i.e., the classes 𝖰𝖭𝖢0, 𝖰𝖠𝖢0, 𝖰𝖳𝖢0, and 𝖰𝖭𝖢wf0.

Finally, we introduce two new complexity classes parameterized by sets S{0,1}:

  • For a given set S, 𝖰𝖭𝖢Si is the class of problems decidable by 𝖰𝖭𝖢S circuits which act on polynomially-many qubits, have polynomial size and depth O(login).

  • For a given set S, 𝖰𝖠𝖢Si is the class of problems decidable by 𝖰𝖠𝖢S circuits which act on polynomially-many qubits, have polynomial size and depth O(login).

Proposition 13 (Proposition 3.1 of [8]).

The following tasks are equivalent for constant-depth circuits consisting of n-gates and single-qubit gates:

  1. 1.

    Preparing the state |0n+|1n2 from |0n and performing the inverse transformation.

  2. 2.

    Applying Fanout 𝖥n.

  3. 3.

    Applying Parity 𝖯n.

In other words, these tasks are equivalent under 𝖰𝖠𝖢0 reductions.

Critical to our construction is the fact that (1) in the above proposition can be relaxed to a more general state preparation task. To see how, we must define a class of a “cat-like” states, first introduced by Rosenthal [21] which he calls nekomata:

Definition 14.

A state |ϕ on n+m qubits is said to be an n-nekomata if there exists some ordering of the qubits such that

|ϕ=|0n|ψ0+|1n|ψ12

where |ψ0 and |ψ1 are arbitrary m-qubit states. The first n qubits of this state are referred to as the target qubits.

As mentioned, Proposition 13 is still true when the cat state in task 1 is replaced with any n-nekomata (see Appendix A for more details). This fact is quite powerful since we only need to design a circuit which produces a state on which some subsystem is “cat-like” in order to compute parity. This makes the prospect of designing a circuit to compute parity far less daunting.

2.3 Approximate Quantum Circuits

Proposition 13 shows that exactly preparing a cat state is in fact computationally equivalent to exactly computing parity, up to some 𝖰𝖠𝖢0 computations and this can further be generalized by relaxing the task of preparing a nekomata state. Further, it is established in [21] that preparing an approximate nekomata state is sufficient to approximately compute parity or fanout. This notion is made precise below.

Definition 15.

For ϵ[0,1] a state |ϕ on n+m qubits is said to be an ϵ-approximate nekomata if there exists some nekomata |ν such that |ν|ϕ|21ϵ.

When we refer to a quantum circuit as approximately computing some function or approximating a given unitary we mean that the circuit, C, and the ideal unitary U have small distance. Explicitly, for ϵ(0,1) we say that C is an ϵ-approximate implementation of U or that C computes U with approximation error ϵ if UCopϵ where op denotes the operator norm.

A statement analogous to Proposition 13 holds for the approximate version of each task:

Lemma 16 (Theorem 3.1 of [21]).

For any ϵ(0,1) the following tasks are equivalent under 𝖰𝖠𝖢0 reductions:

  • Preparation of O(ϵ)-approximate nekomata from the all zeros state and the inverse transformation

  • Approximately computing Parity with error O(ϵ)

  • Approximately computing Fanout with error O(ϵ)

This lemma is again quite useful for us as circuit designers; now any circuit producing a state which has some subsystem that is approximately cat-like suffices to approximately implement fanout or parity.

Finally, we define the bounded-error analogues of the quantum circuit complexity classes introduced thus far:

Definition 17 (𝖡𝖰𝖭𝖢i).

A decision problem L{0,1} is in 𝖡𝖰𝖭𝖢i if there exists a family of 𝖰𝖭𝖢i circuits {Cn}n acting on n+a(n)=𝗉𝗈𝗅𝗒(n) qubits and a constant c>0 such that for all n and x{0,1}n measuring the last qubit of Cn|x|0a(n) in the computational basis yields

  • |1 with probability at least 2/3 if xL

  • |1 with probability at most 1/3 if xL

𝖡𝖰𝖠𝖢i, 𝖡𝖰𝖳𝖢i, 𝖡𝖰𝖭𝖢wf0, 𝖡𝖰𝖭𝖢Si, and 𝖡𝖰𝖠𝖢Si are defined similarly for their respective circuit classes.

3 Bootstrapping weak parity gates

In this section we will show that for any non-empty parity restricted set S{0,1}n the unitary US|x=(1)𝕀xS|x can be bootstrapped in constant depth to approximately compute Parity. This construction generalizes the constant-depth exponential-size 𝖰𝖠𝖢 circuit family given in [21]. As a corollary, we find that for any polynomial p, there exist poly-size 𝖰𝖳𝖢0 circuits which have fidelity 11/p(n) with Parity.

3.1 Grid Construction

Rather than directly computing Parity, the circuits described in this section will prepare approximate nekomata, which via Lemma 16 can be used to compute Parity and Fanout with high essentially the same approximation error, up to constant factors.

We will make use of the following lemma:

Lemma 18 (Lemma 4.3 of [21]).

Let |φ be a state with n “target” qubits that measure to all-zeros with probability at least 1/2ϵ and all-ones with probability at least 1/2ϵ. Then there exists an n-nekomata |ν such that |ν|φ|212ϵ.

Proof.

Suppose that the first n qubits of |φ are the targets. Then, the state

|ν=12b{0,1}|bnbn|𝕀|φ|bnbn|𝕀|φ

is an n-nekomata and

|φ|ν|2=(12b{0,1}|bnbn|𝕀|φ)212(1/2ϵ+1/2ϵ)2=12ϵ

As mentioned, a parity-restricted gate can be thought of as a “weak” parity gate in the sense that it correctly computes parity on some 1+ϵ2-fraction of the inputs. The idea behind our construction is to use these “weak” parity gates to prepare many bad, but not horrible approximate cat states in parallel. These bad, but not horrible cat states are of the form

|ϕ=p0|0n+p1|1n+ϵ|ω

where |ω is orthogonal to |bn for b{0,1}. These initial states are bad approximate cat states in the sense that they may have little overlap with the cat state, but aren’t horrible because the distributions corresponding to their measurement outcomes are peaked only at |0n and |1n. In the final stage of our construction we accrue the distributions on each of these bad cat states into some n target qubits using Toffoli gates. We then show that for the right choice of parameters this accruing step effectively amplifies the original bad, but not horrible, distribution given by each of the weak parity gates in parallel. The final result is a good approximate nekomata.

Figure 1: Constructing a nekomata from US and Toffoli gates. Target qubits are shown in blue. Here RψS is a rotation about the state |ψS=1|S|xS|x discussed further below.
Theorem 19.

For any parity-restricted S{0,1}n with |S|2n4 there exists a depth-4, O(n+22n|S|2)-size 𝖰𝖠𝖢S circuit that constructs an O(|S|222n)-approximate nekomata.

Proof.

The 𝖰𝖠𝖢S circuit will act on n(m+1) qubits arranged in a grid of width m+1 and height n. The first column will be designated as the “target” qubits, initialized to |0n and all other columns initialized to |1n (say, with a layer of X gates). To each non-target column apply an RψS=𝕀2|ψSψS| gate, where |ψS=HnUSHn|0n. Note that this can be implemented in depth-3 as

𝕀2|ψSψS|=HnUSHn(𝕀2|0n0n|)HnUSHn

which looks like the following quantum circuit:

Finally, apply a Toffoli gate along each row with the output qubit being the corresponding target qubit (i.e. the qubit in the first column). We will now show that the probability that the target column is measured (in the computational basis) as |bn is at least 12|S|2n2 for b{0,1}.

To start, let

γ0:=0n|ψS2 =0n|HnUSHn|0n2
=[0n|(|0n12n1sSx{0,1}n(1)x,s|x)]2
=(1|S|2n1)2

and

γ1:=1n|ψS2 =1n|HnUSHn|0n2
=[1n|(|0n22nsSx{0,1}n(1)x,s|x)]2
=(12n1sS(1)|s|)2
=|S|222n2

where the last line follows from the fact that S is parity-restricted. For b{0,1}, let pb be the probability that a given non-target column yields |bn after measuring in the computational basis. We have

p0 =0n|(𝕀2|ψSψS|)|1n2=4γ0γ1
p1 =1n|(𝕀2|ψSψS|)|1n2=(12γ1)2

Notice that, in fact, γ1 is maximized when S is parity-restricted (at least among sets of the same size as S). This, in turn, minimizes the probability that any given column is measured as |1n (though this will still be the most likely outcome since |S|2n4). Though, one may perform a similar analysis for sets S which are only approximately parity-restricted.

Recall that after we’ve created these states along the column, we apply Toffoli gates along the rows. Let’s now compute the probability that we measure all zeroes or all ones on the first column. Note that computational basis measurements commute with Toffoli gates of any size, so in order for the targets to be measured as |1n all other columns must also be measured as |1n. Let m=ln(2)2ln(12γ1), then

[Targets measure |1n] =(12γ1)2m
>(12γ1)ln(2)ln(12γ1)+1
=12(12γ1)
=12|S|222n2

Now, call a non-target column “bad” if it is measured as anything other than |0n or |1n. Via a union bound,

[Some column is bad]m(1p0p1) =m(14γ0γ1(12γ1)2)
=4mγ1(1γ0γ1)

Observe that

12(12γ1)=(12γ1)ln(2)ln(12γ1)+1(12γ1)2mexp(4mγ1),

so 4mγ1ln(12γ1)<1 as γ1116. So,

4mγ1(1γ0γ1) <1γ0γ1
|S|2n2

Thus, every column is good with probability at least 1|S|2n2 and the targets are measured as |0n with probability at least 1|S|2n2(12|S|222n2)12|S|2n2. By Lemma 18, the state produced is an |S|2n3-approximate nekomata. Also, note that m=Θ(1/γ1), hence the circuit has size O(n+m)=O(n+22n|S|2).

Note that |S| affects our construction in two distinct ways. First, the bound we obtain on the probability that some non-target column is bad is proportional to the size of S, i.e., |S|2n2. This exactly corresponds to the approximation guarantee on the final approximate nekomata state. In particular, this means that for very large parity-restricted sets the approximation guarantee becomes quite poor. On the other hand, we choose m=Θ(22n2|S|2) so that the entire grid is measured as |1nm with probability very close to 12. This means that the number of columns in the grid shrinks as |S| grows. The main takeaway is that this construction results in a tradeoff between the quality of the approximate nekomata state and circuit size depending on |S|.

It may seem peculiar that we require S to not be too large in Theorem 19, but this is indeed necessary. Intuition suggests that US should approximate the parity gate better when |S| approaches 2n1. In some sense, this is true: if |S|=2n1, then US exactly computes parity (or its negation) in the phase. In this extreme setting, unfortunately, US ceases to be a useful multi-qubit gate, as it is simply ±Zn. Roughly, as |S| grows too large, US no longer becomes useful for preparing column-states with the peaked measurement distributions exploited in the grid construction as a result of how weakly entangling the US gate becomes. However, if you weakly approximate parity as a classical reversible gate (instead of in the phase) there is another set of techniques which allow for very large parity-restricted gates to be bootstrapped to approximately compute parity. This is further discussed in Section 3.2.

It should also be noted that by applying Zn to US yields UT where T is a parity restricted set of size 2n1|S|, which can be used to achieve a similar grid construction with different size and accuracy parameters.

Finally, let’s compare our construction to that of Rosenthal [21], which only requires generalized Toffoli gates. If one takes S to be a singleton set then US is locally-equivalent to a generalized Toffoli gate and the resulting constructions have similar guarantees. However, one critical difference is the fact that the present construction has depth-4 while Rosenthal’s is only depth-2 since it uses column states HnUSHn|1n. Let us analyze roughly what goes wrong in the present construction if we were to use the same state. First, let dS:=|S|2n1 be the density of S. We have dS and 1dS are the amplitudes on the states |0n and |1n, respectively. Following the previous analysis, we find that all columns measure |1n with probability (1dS)2m and the probability that some column measures |0n is 1(1dS2)m. This means that if (1dS)2m12 then m=Θ(1/dS), but (1dS2)m tends to 1 when m=Θ(1/dS), so it seems that a depth-2 construction just using Hadamard and US gates may not be possible (at least with these techniques). However, it may be possible to obtain a depth-2 construction here by using some single-qubit gate other than the Hadamard (as Rosenthal does), but this will lead to a slightly more complicated analysis.

Corollary 20.

If S{0,1} is a parity-restricted set satisfying 2n|S{0,1}n|=O(na) and 2n|S{0,1}n|=Ω(nb) for some constants a,b>0 then 𝖡𝖰𝖠𝖢S0𝖡𝖰𝖭𝖢wf0.

Proof.

Observe that the construction in Theorem 19 yields a constant-depth 𝖰𝖠𝖢S circuit of size O(n+22n|S{0,1}n|2)=O(n+n2a)=𝗉𝗈𝗅𝗒(n) which prepares an O(nb)-approximate nekomata on n qubits. Via Lemma 16 this circuit can be leveraged to compute 𝖥n with approximation error O(nb). Note that we can make the error an arbitrarily small polynomial by making the circuit polynomially larger. That is, suppose we want to construct an O(nc)-approximate n-nekomata for some c>b. Simply use the construction above for an nc/b-nekomata, but only use n of the targets (i.e., any nc/b-nekomata is an n-nekomata for c>b). The circuit will have size O(nac/b) and error at most O(nc). Therefore, by Lemma 16, there are poly-size 𝖰𝖠𝖢S0 circuits to compute Fanout to arbitrary polynomial precision, so 𝖡𝖰𝖠𝖢S0𝖡𝖰𝖭𝖢wf0. Next, we argue that Threshold gates can be used to hit the “sweet spot” of Theorem 19. Namely, they correspond to parity-restricted sets of the right size to make the size and accuracy of the above construction polynomial.

Corollary 21.

𝖡𝖰𝖳𝖢0=𝖡𝖰𝖭𝖢wf0

Proof.

Without a loss of generality assume n is even, otherwise this can be rectified with a single ancilla qubit. Let S={x{0,1}n:|x|=n2}, i.e., S is the Hamming slice of weight n2. Note that |x|=n2 if and only if Maj(x1,,xn)=1 and Maj(x1,xn,0)=0, thus the US gate can be implemented in depth 2 by applying Majn once to |ψ, tacking on an ancilla qubit set to |0 then applying another Majn gate to |ψ|0. In this case the circuit from Theorem 19 has size s=O(n+22n|S|2) and produces an ϵ=|S|2n3-approximate nekomata. Since

(nn/2)[2n2n,2nnπ/2]

via Stirling’s formula, it follows that s=O(n) and ϵ=O(1n). Via Corollary 20, 𝖡𝖰𝖳𝖢0=𝖡𝖰𝖭𝖢wf0.

Theorem 22.

The following tasks are equivalent under 𝖰𝖠𝖢0 reductions:

  1. 1.

    Preparation of 1𝗉𝗈𝗅𝗒(n)-approximate nekomata from the all zeros state and the inverse transformation for some polynomial p

  2. 2.

    Approximately computing 𝖳𝗁n,k to error 1𝗉𝗈𝗅𝗒(n) for any k=𝗉𝗈𝗅𝗒(n)

  3. 3.

    Approximately computing 𝖤𝖷2k2,k1 to error 1𝗉𝗈𝗅𝗒(n) for any k=𝗉𝗈𝗅𝗒(n)

Proof.

Observe that for x{0,1}2k2 𝖤𝗑2k2,k1(x)=𝖳𝗁2k2,k1(x)𝖳𝗁2k2,k(x)¯ and we can implement this directly by first applying U to |x|10n(2k2)1 and then to |x|0n(2k2) for any U which computes 𝖳𝗁n,k to inverse-polynomial precision. Thus, (2) (3).

Via Theorem 19 a unitary V approximately computes 𝖤𝖷2k2,k1 to inverse-polynomial error can be leveraged in constant-depth to prepare an 1𝗉𝗈𝗅𝗒(k)-approximate nekomata on k qubits. In fact, this construction will yield an approximate nekomata on 2k2 qubits, but as mentioned in the proof of Corollary 20, an approximate nekomata on 2k2 is also an approximate nekomata on k qubits. This circuit (and the inverse transformation) can be used to compute 𝖥k to the same precision, up to constants (Lemma 16). Observe that iteratively applying 𝖥k in a tree-like fashion allows us to apply fanout to a number of qubits that is exponential in the depth. In particular, using 𝖥k gates we can construct 𝖥kd in depth d by stacking 𝖥k gates on the qubits which we have fanned-out onto in the previous layer (see Figure 2). This means that as long as k=nα for some α=Ω(1) we can take d=O(1/α)=O(1) and approximately compute 𝖥n to the same precision. Hence, (3) (1).

Finally, (1) (2) can be seen by applying the construction of [13] which allows us to approximately compute any threshold gate on 𝗉𝗈𝗅𝗒(n) qubits via a 𝖰𝖭𝖢wf0 circuit.

Figure 2: Iteratively applying Fk in a tree-like fashion. In the above, we achieve fanout k2+k in depth-2 using Fk gates.

3.2 Removing the Toffoli gates

The construction presented in Theorem 19 for generic S requires large Toffoli gates, however we will show that for some regimes of |S|, these gates are unnecessary, i.e., 𝖰𝖭𝖢S0 circuits can exactly compute Toffoli on polynomially many qubits. We will show that this is indeed the case when |S|2nO(1) and |S|2(1ϵ)n for a fixed constant ϵ<1. Though for this result we require the use of UfS gates, i.e., those gates which check membership in S as UfS|x|b=|x|f(x)b where f(x)=1 iff xS.

Lemma 23.

Let c be constant, and let S be a parity restricted set with size |S|2nc and strings of parity b{0,1}. Then, there exist some c1 bit-strings t1,t2,tc1{0,1}n such that |x|b(mod2) iff

ySpan(t1,tc1){xyS}

is satisfied.

Proof.

Suppose that b=0. Let 𝔽2n be the subspace of dimension n1 consisting of vectors of even Hamming weight. Observe that S, and moreover that S contains at least nc linearly independent elements of . Therefore, there exist some t1,tc1 such that S{t1,tc1} span . Hence, every element of can be written as sy for some sS and yspan{t1,tc1}. Thus, |x|0 iff the disjunction is satisfied.

If b=1 then the set S obtained by flipping the first bit of every element of S contains vectors of even Hamming weight - further, S contains at least nc linearly independent vectors. So, take {t1,tc1} as before such that S{t1,tc1} spans . Any vector of even Hamming weight, y𝔽2n, can be expressed as y=s+t for some sS and tSpan(t1,tc1). Now, observe that if x=(x1,xn)𝔽2n has odd Hamming weight then (x11,xn) can be expressed as s+t for some sS and tSpan(t1,tc1). Since s=(s11,s2,sn) for some s=(s1,s2,sn)S, it follows that x has odd Hamming weight iff the disjunction is satisfied. As shown above, if |S|2nc for some constant c then we can extend S linearly to “cover” all strings of a fixed Hamming weight. To implement a parity gate in this way, one can encode every linear combination tSpan(t1,tc) in the ancilla and then apply a UfS gate to |xt for every t - this can be done in constant depth using only UfS and 𝖢𝖭𝖮𝖳 gates since |Span(t1,tc)|=2c=O(1). If any of these UfS gates evaluate to 1 then x must have Hamming weight consistent with that of the strings in S, in effect computing the parity of x. We will now see how generalized Toffoli can be computed with just UfS and 𝖢𝖭𝖮𝖳 gates when |S| is sufficiently small.

Lemma 24.

For any S{0,1}n there exists some sS and some subset of indices {i1,i2,ik} such that s is the unique xS which satisfies xij=sij for all j[k]. Further, klog|S|.

Proof.

Note that unless |S|=1 there exists some index on which elements of S take different values. If |S|=1 we are done and can take this single element to be s. Otherwise, let i1 be the first index on which elements of S take different values. We will now partition S into two sets S01 and S11 where Sb1={sS|si1=b}. Take T1 to be the set with fewer elements and repeat this procedure, defining Tj similarly for j>1. Since |Tj+1||Tj|2 for j1, it follows that for some k>1, |Tk|=1 and it follows that klog|S|. Now, when |S| is sufficiently small, there is always some way to fix a small number (log|S|) of bits so that the unfixed bits have a unique assignment consistent with S. In particular, if |S|2(1ϵ)n for some constant ϵ(0,1) then there exists some partial assignment of at most (1ϵ)n bits such that for the remaining ϵn bits there is a unique assignment such that the resulting string is a member of S. For simplicity, suppose that the partial assignment is on the first (1ϵ)n as y{0,1}(1ϵ)n and that the unique assignment for the remaining bits which is consistent with S is z{0,1}ϵn. Now, for x{0,1}ϵn we can see that

UfS|y|x|0=|y|x|𝕀z(x)

In this way, after fixing the first (1ϵ)n bits to y forces UfS to act like a Uf{z} gate on the remaining qubits. This gate is locally equivalent to a Toffoli gate; we can just apply X gates to the wires on which zi=0. Since ϵ is a constant, we can repeat this procedure 1/ϵ times to implement the Toffoli gate on n qubits in constant depth. In this way, we can directly implement generalized Toffoli gates in the grid construction of Theorem 19 using the UfSn gates when |S| is sufficiently small.

Corollary 25.

For any parity restricted set S which satisfies |S|Ω(2n) or |S|2(1ϵ)n for some fixed ϵ(0,1) there exist constant-depth 𝖰𝖭𝖢S circuits of size O(n+22n|S|2) which prepare O(|S|222n)-approximate nekomata. In particular, for S{0,1} which are parity restricted and satisfy 2n|S{0,1}n|=𝗉𝗈𝗅𝗒(n), 𝖡𝖰𝖭𝖢S0𝖡𝖰𝖭𝖢wf0.

4 Quantum MOD gates are powerful even on their own

In this section we show a strengthening of a result of [8]. In particular they show that for any fixed q>1 𝖬𝖮𝖣n,q gates, n gates, single- and two-qubit gates can be leveraged to implement Fanout in constant depth and polynomial size:

Theorem 26 (Theorem 4.6 of [8]).

For p2, 𝖰𝖠𝖢0[p]=𝖰𝖠𝖢wf0.

However, the family of 𝖰𝖠𝖢0[p] circuits they construct does not actually require n gates, i.e., the family of circuits they construct to compute 𝖥n is actually a 𝖰𝖭𝖢0[p] family. This immediately yields the collapse of all 𝖰𝖭𝖢0[p]:

Theorem 27.

𝖰𝖭𝖢0[p]=𝖰𝖭𝖢wf0 for all primes p.

Combined with the results of [13] and [24] we have an even larger collapse of constant-depth circuit classes:

𝖰𝖭𝖢0[p]=𝖰𝖭𝖢wf0=𝖰𝖠𝖢wf0=𝖰𝖠𝖢0[q]=𝖰𝖳𝖢wf0

for all p,q2. Before showing and analyzing the construction we will introduce some preliminaries.

4.1 Simulating qudit arithmetic in 𝗤𝗡𝗖𝟎[𝒑]

By a proposition of Moore, in order to inplement 𝖥n it actually suffices to construct a circuit which behaves like 𝖥n when all but the one qubit to be fanned-out are set to |0:

Proposition 28 (Proposition 1 of [17]).

In any class of quantum circuits which includes Hadamard and 𝖢𝖭𝖮𝖳-gates, the follow are equivalent in constant depth:

  1. 1.

    It is possible to map (α|0+β|1)|0n1 to α|0n+β|1n and from α|0n+β|1n to (α|0+β|1)|0n1 for all |α|2+|β|2=1

  2. 2.

    𝖥n can be implemented with at most n1 ancilla qubits

  3. 3.

    𝖯n can be implemented with at most n1 ancilla qubits

Hence, constructing a unitary U which satisfies U(α|0+β|1)|0n1+a(n)=(α|0n+β|1n)|0a(n) for any single-qubit state α|0+β|1 will result in the ability to compute Fanout; this is exactly what the construction does.

First, for a fixed prime p consider the qudit generalizations of the Parity and Fanout gates for local dimension p:

𝖬n,p|b|x1x2xn =|b|x|modp|x1x2xn
𝖥n,p|b|x1x2xn =|b|(x1+bmodp),(x2+bmodp),,(xn+bmodp)

where x1,,xn,b{0,p1}.

Additionally, consider the following single-qudit gate:

Qp|b=1pj=0ωjb|j

where ω=e2iπ/p. For example, when p=2, Qp=H, and 𝖥n,p and 𝖬n,p are the usual Fanout and Parity gates for qubits, respectively.

Lemma 29 (Proposition 4.2 of [8]).

𝖬n,p=(Qp)(n+1)𝖥n,pQp(n+1)

Recall our goal: we want to use Mod-p gate to simulate Fanout over qubits. While this seems somewhat challenging for qubits, it is trivial over qudits of local dimension p by Lemma 29. Therefore, our plan will be to pretend that we are in that setting by encoding a qudit using several qubits. Once we have set an encoding, we need encoded versions of the 𝖬n,p and Qp gates in Lemma 29. Encoding the Qp gate is easy – it’s a gate of constant-size and each one of our encoded qudits will be of constant size, so any brute force encoding of Qp will do. The challenging step is to show that an encoded 𝖬n,p gate is possible using (qubit) 𝖬𝖮𝖣n,p gates. One of the key observations is that after we’ve applied the encoded Fanout, we will have accomplished Task 1 of Proposition 28, and therefore, we can construct general Fanout over qubits.

Proof of Theorem 27.

To start, define a linear encoding map E:p(2)p which maps from qudits of local dimension p to a tensor product of p qubits:

E|j=k=0p1|δk,j

for all j{0,,p1} and where δk,j denotes the Kronecker delta function. Note that E is a linear map of full rank from the p-dimensional space spanned by {|j}j=0p1 to the p-dimensional subspace of (2)n spanned by{k=0p1|δk,j}j=0p1. Throughout this construction we will only be working over qubits and not actually implementing E. Instead, we will be exploiting the equivalence of Lemma 29 by simulating qudit arithmetic with qubits. We introduce E for the sake of describing this simulation method succinctly.

Now, applying Qp to an encoded state on qubits amounts to implementing any Q~p which satisfies:

Q~p|δ0,j|δ1,j|δp1,j=1pk=0p1ωjk|δ0,k|δ1,k|δp1,k

i.e., any unitary Q~p on (2)p which respects the homomorphism induced by E. Since p is fixed, any such Q~p operates on constantly many qubits and can be implemented in constant depth and size.

Now, we must implement 𝖬n,p on the encoded subspace using just 𝖬𝖮𝖣n,p (and one- and two-qubit gates). Note that for any j1,jn𝔽p their sum modulo p can be decomposed j1++jn 0(δ0,j1++δ0,jn)+1(δ1,j1++δ1,jn)++(p1)(δp1,j1++δp1,jn) k=0p1k(i=1nδk,ji)(modp)
Let sk:=i=1nδk,ji(modp) be the number of ji terms equal to k modulo p. Let’s also define a family of generalized Mod-p gates over qubits. For {0,p1}, recall 𝖬𝖮𝖣n,p, acts as

𝖬𝖮𝖣n,p,|b|x1,xn=|b𝖬𝗈𝖽n,p,(x)|x1,xn

Notice that 𝖬𝖮𝖣n,p, can be implemented over qubits with an 𝖬𝖮𝖣n,p,0 gate (the standard 𝖬𝖮𝖣n,p gate on qubits) and p1 additional ancilla qubits, p of which are set to 1. We can use these gates to compute sk for a given k{0,1p1}:

For k{0,p1} the above circuit, can be applied in parallel to the appropriate qubits in the encoding; namely those of the form |δk,j for fixed k. This leaves us with the state E|s0E|sp1. Recall that the sum over 𝔽p we wish to compute is i=1nji=k=0p1ksk; so, if we can compute each of ksk and sum over all k, we will be left with the desired sum. However, the product ksk is over elements of 𝔽p and p is fixed, so it is clear that this can be computed in 𝖰𝖭𝖢0 (𝖭𝖢0 even). Further, k=0p1ksk is a sum of constantly many integers each described by p=O(1) bits, which is of course computable by a 𝖰𝖭𝖢0 (𝖭𝖢0 even) circuit.

For the sake of completeness, we will describe a circuit composed of permutations on p qubits which computes k=0p1ksk in our encoded subspace. First let Uσ be the permutation unitary which satisfies UσE|j=E|j1modp. For any k{0,1,p1}, Uσk can be implemented in constant depth via a sequence of at most p2 swap gates. Since UσaE|j=E|ja for all a,j𝔽p, we can in series apply Uσksk to E|b to finally achieve

(k=0p1Uσksk)E|b =Uσk=0p1kskE|b
=Uσi=1nji|b
=E|bi=1njimodp

Hence, this gives a circuit of depth p3=O(1) and linear size for simulating 𝖬p,n on the encoded qudits. After conjugating by Q~p gates on the appropriate groups of qubits the equivalence of Lemma 29 shows that the entire circuit exactly implements fanout on the encoded qudits.

Let’s now put all the pieces together to show that we can achieve Task 1 of Proposition 28. Starting with the state (α|0+β|1)|0(p(n+1)1), we want to get to an encoding of α|0+β|1 and the ancillary qubits. First, apply a 𝖢𝖭𝖮𝖳 gate from the first to second qubit, followed by an X gate on the first to obtain the state

(α|10p1+β|010p2)|0pn.

Now apply an X gate to the (pj+1)st qubit for j{0,1,n1} yielding the encoded state: E(α|0+β|1)E|0n. Note that this is a state on n+1 qudits encoded by p(n+1) qubits. After applying the previously described circuit which simulates 𝖥n,p on the encoded states the result is (up to a known permutation of the qubits)

(α|02(n+1)+β|12(n+1))|0(p2)(n+1)

Now, via Proposition 28 any such circuit is sufficient to compute Fanout (on qubits), thus 𝖰𝖭𝖢wf0𝖰𝖭𝖢0[p]. It is shown in [13] and [24] that the reverse inclusion holds and it can be concluded that 𝖰𝖭𝖢wf0=𝖰𝖭𝖢0[p].

Corollary 30.

𝖰𝖭𝖢0[a]=𝖰𝖭𝖢wf0 for all a>1.

Proof.

This follows from the previous construction by taking p to be any prime factor of a and setting ancilla qubits appropriately or concatenating the input a/p times so that any 𝖬𝖮𝖣a gate instead computes 𝖬𝖮𝖣p.

References

  • [1] Miklós Ajtai. Σ1-formulae on finite structures. Annals of pure and applied logic, 24(1):1–48, 1983.
  • [2] Anurag Anshu, Yangjing Dong, Fengning Ou, and Penghui Yao. On the computational power of 𝖰𝖠𝖢0 with barely superlinear ancillae. arXiv preprint, 2024. arXiv:2410.06499.
  • [3] Debajyoti Bera. A lower bound method for quantum circuits. Information processing letters, 111(15):723–726, 2011. doi:10.1016/J.IPL.2011.05.002.
  • [4] Sergey Bravyi, David Gosset, and Robert König. Quantum advantage with shallow circuits. Science, 362(6412):308–311, 2018.
  • [5] Harry Buhrman, Marten Folkertsma, Bruno Loff, and Niels MP Neumann. State preparation by shallow circuits using feed forward. arXiv preprint, 2023. arXiv:2307.14840.
  • [6] Maosen Fang, Stephen Fenner, Frederic Green, Steven Homer, and Yong Zhang. Quantum lower bounds for fanout. Quantum Information and Computation, 6(1):046–057, 2006. doi:10.26421/QIC6.1-3.
  • [7] Merrick Furst, James B Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical systems theory, 17(1):13–27, 1984. doi:10.1007/BF01744431.
  • [8] Frederic Green, Steven Homer, Cristopher Moore, and Christopher Pollett. Counting, fanout, and the complexity of quantum 𝖠𝖢𝖢. arXiv preprint, 2001. arXiv:quant-ph/0106017.
  • [9] Daniel Grier and Luke Schaeffer. Interactive shallow clifford circuits: quantum advantage against nc1 and beyond. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 875–888, 2020. doi:10.1145/3357713.3384332.
  • [10] Alex Bredariol Grilo, Elham Kashefi, Damian Markham, and Michael de Oliveira. The power of shallow-depth Toffoli and qudit quantum circuits. arXiv preprint, 2024. doi:10.48550/arXiv.2404.18104.
  • [11] Jonas Haferkamp, Dominik Hangleiter, Adam Bouland, Bill Fefferman, Jens Eisert, and Juani Bermejo-Vega. Closing gaps of a quantum advantage with short-time hamiltonian dynamics. Physical Review Letters, 125(25):250501, 2020.
  • [12] Johan Håstad. Computational limitations for small depth circuits. PhD thesis, Massachusetts Institute of Technology, 1986.
  • [13] Peter Høyer and Robert Špalek. Quantum fan-out is powerful. Theory of computing, 1(1):81–103, 2005. doi:10.4086/TOC.2005.V001A005.
  • [14] Harry Levine, Alexander Keesling, Giulia Semeghini, Ahmed Omran, Tout T Wang, Sepehr Ebadi, Hannes Bernien, Markus Greiner, Vladan Vuletić, Hannes Pichler, et al. Parallel implementation of high-fidelity multiqubit gates with neutral atoms. Physical review letters, 123(17):170503, 2019.
  • [15] Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, fourier transform, and learnability. Journal of the ACM (JACM), 40(3):607–620, 1993. doi:10.1145/174130.174138.
  • [16] Klaus Mølmer and Anders Sørensen. Multiparticle entanglement of hot trapped ions. Physical Review Letters, 82(9):1835, 1999.
  • [17] Cristopher Moore. Quantum circuits: Fanout, parity, and counting. arXiv preprint, 1999. arXiv:quant-ph/9903046.
  • [18] Shivam Nadimpalli, Natalie Parham, Francisca Vasconcelos, and Henry Yuen. On the pauli spectrum of 𝖰𝖠𝖢0. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1498–1506, 2024.
  • [19] Daniel Padé, Stephen Fenner, Daniel Grier, and Thomas Thierauf. Depth-2 𝖰𝖠𝖢 circuits cannot simulate quantum parity. arXiv preprint, 2020. arXiv:2005.12169.
  • [20] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mat. Zametki, 41(4):598–607, 1987.
  • [21] Gregory Rosenthal. Bounds on the 𝖰𝖠𝖢0 complexity of approximating parity. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ITCS.2021.32.
  • [22] Mark Saffman and Klaus Mølmer. Efficient multiparticle entanglement via asymmetric Rydberg blockade. Physical review letters, 102(24):240502, 2009.
  • [23] Roman Smolensky. Algebraic methods in the theory of lower bounds for Boolean circuit complexity. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 77–82, 1987. doi:10.1145/28395.28404.
  • [24] Yasuhiro Takahashi and Seiichiro Tani. Collapse of the hierarchy of constant-depth exact quantum circuits. computational complexity, 25:849–881, 2016. doi:10.1007/S00037-016-0140-0.
  • [25] Barbara M Terhal and David P DiVincenzo. Adaptive quantum computation, constant depth quantum circuits and arthur-merlin games. arXiv preprint, 2002. arXiv:quant-ph/0205133.
  • [26] Adam Bene Watts, Robin Kothari, Luke Schaeffer, and Avishay Tal. Exponential separation between shallow quantum circuits and unbounded fan-in shallow classical circuits. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 515–526, 2019. doi:10.1145/3313276.3316404.
  • [27] Andrew Chi-Chih Yao. Separating the polynomial-time hierarchy by oracles. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), pages 1–10. IEEE, 1985.

Appendix A Deferred Proofs

Proof of Proposition 28.

To see that (2) (3) it suffices to conjugate either gate by Hadamard gates, i.e., H(n+1)𝖥nH(n+1)=𝖯n. (2) (1) because Fn satisfies the condition described in (1) exactly:

Fn(α|0+β|1)|0n=α|0n+1+β|1n+1

and 𝖥n is its own inverse. Let C be any unitary which satisfies (1). To see that (1) (3) we construct a circuit using C and C in essentially the same way that we did in the proof of Proposition 13:

To see that this circuit exactly computes parity note that after the first Hadamard gate is applied the ancilla bits are in the state |0+(1)b|12|0n1 and after applying C we have |0n+(1)b|1n2. After the 𝖢𝖹 gates are applied we are left with the state |0n+(1)b+i=1nxi|1n2. After applying C we are left with |0+(1)b+i=1nxi|12|0n1, so the final Hadamard gate leaves the output qubit and the ancilla qubits in the state |bi=1nxi|0n1. It should be noted that when our circuit C has property (1) it is in some sense stronger than the guarantee C|0n=|0n+|1n2. In the case of the latter it seems that an 𝖮𝖱 gate is required to compute parity with C and C, which Proposition 28 shows is not necessary when C(α|0+β|1)|0n1=α|0n+β|1n for all one-qubit states α|0+β|1.

To see that (2) (1) note that

𝖥n|+|0n=|0n+1+|1n+12

Observe that if given access to some circuit C (and C) which satisfies the weaker condition of preparing an exact n-nekomata, i.e.,

C|0m=|0n|ψ0+|1n|ψ12

one can construct a constant-depth 𝖰𝖠𝖢 circuit which exactly computes Parity:

Figure 3: Computing 𝖯n with a circuit C which prepares an exact n-nekomata and its inverse C.

Note that in the below circuit after the first layer of 𝖢𝖹 gates are applied the state on the ancilla qubits is |0n|ψ0+(1)|x||1n|ψ12. When |x| is even, nothing has happened, so C will return the state on these registers to |0m and the -gate will not change the final register. When |x| is odd |0n|ψ0+(1)|x||1n|ψ12 is orthogonal to C|0m, so after applying C the resulting state is orthogonal to |0m which will always trigger the -gate. The second half of the circuit uncomputes returning the ancillary registers to |0m. Thus, the final register is always left in the state |bi=1nxi - thus, (up to an X gate) this circuit exactly computes parity.

Finally, we prove Lemma 16 and in particular that the below circuit approximates 𝖯n when C is replaced with any U which produces an ϵ-approximate n-nekomata when applied to the all zeros state.

Proof of Lemma 16.

Let U be a unitary on m qubits such that U|0m=|ψ is an ϵ-approximate n-nekomata and let |ν=|0n|ψ0+|1n|ψ12 be the n-nekomata on m qubits which maximizes |ν|ψ|2. We can write |ψ=1ϵ|ν+ϵ|ν for some |ν which is orthogonal to |ν. In the circuit shown in Figure 3 observe that after the first layer of 𝖢𝖹 gates are applied the state on the ancilla registers is

|ψ =1ϵ|0n|ψ0+(1)|x||1n|ψ12+ϵ|ν

For some |ν. Observe that when |x| is even then |ψ|ψ|212ϵ and when |x| is odd then |ψ|ψ|2=O(ϵ). In the former case C|ψ will have fidelity at least 1O(ϵ) with |0m, so after uncomputation we are left with |x,0m|ωb where |b|ωb|21O(ϵ). Similarly, when |x| is odd C|ψ has fidelity at most O(ϵ) with |0m and after uncomputation we are left with |x,0m|ωb where |b|ωb|2O(ϵ). Thus, on any input |x,b the state produced by this circuit has fidelity at least 1O(ϵ) with Fn|x,b - equivalently, the 2-distance is at most O(ϵ) meaning that the unitary implemented by this circuit, V, satisfies V𝖯nop=O(ϵ). Thus, (1) (3).

To see that (3) (2) suppose that the unitary U satisfies U𝖯nopϵ. Then,

H(n+1)UH(n+1)𝖥nop =H(n+1)(U𝖯n)H(n+1)opU𝖯nopϵ

For (2) (1) let |ψ=𝖥n|+|0n=|0n+|1n2 and |ϕ=U|+|0n. Note that if U𝖥nopϵ then

|ϕ|ψ2|+|0n2U𝖥nopϵ

So,

|ϕ|ψ2 =2ψ|ϕϕ|ψϵ|ψ|ϕ|21ϵ2ϵ4/4

Thus, |ϕ is an O(ϵ)-approximate nekomata.

Proof.

Proof of Lemma 29 This can be seen via a direct computation:

𝖥n,pQn+1|b|x =𝖥n,p(1pj=0p1ωbj|j)(1pny𝔽pnωx,y|y)

Here x,y denotes the inner product over vectors in 𝔽pn: x,y=j=1pxjyjmodp. For y𝔽pn and j𝔽p we will use y(j) to denote the string obtained by adding j to every entry of y, i.e., Fp|j|y=|j|y(j). Now we can see that

(1pj=0p1ωbj|j)(1pny𝔽pnωx,y|y)=1pn+1y𝔽pnj=0p1ωbj+x,y|j|y

So,

𝖥n,p1pn+1y𝔽pnj=0p1ωbj+x,y|j|y =1pn+1y𝔽pnj=0p1ωbj+x,y|j|y(j)

After rearranging we have

1pn+1y𝔽pnj=0p1ωbj+x,y|j|y(j) =1pn+1y𝔽pnj=0p1ωbj+x,y(j)|j|y
=1pn+1y𝔽pnj=0p1ωbj+x,yj|x||j|y
=1pn+1y𝔽pnj=0p1ωx,yωj(b|x|)|j|y
=Q(n+1)𝖬n,p|b|x

Thus, 𝖬n,p=(Qp)n+1𝖥n,pQpn+1 as claimed.