A State Distillation Protocol to Implement Arbitrary Single-qubit Rotations

An important task required to build a scalable, fault-tolerant quantum computer is to efficiently represent an arbitrary single-qubit rotation by fault-tolerant quantum operations. Traditionally, the method for decomposing a single-qubit unitary into a discrete set of gates is Solovay-Kitaev decomposition, which in practice produces a sequence of depth O(\log^c(1/\epsilon)), where c~3.97 is the state-of-the-art. The proven lower bound is c=1, however an efficient algorithm that saturates this bound is unknown. In this paper, we present an alternative to Solovay-Kitaev decomposition employing state distillation techniques which reduces c to between 1.12 and 2.27, depending on the setting. For a given single-qubit rotation, our protocol significantly lowers the length of the approximating sequence and the number of required resource states (ancillary qubits). In addition, our protocol is robust to noise in the resource states.

Given recent progress in quantum algorithms, quantum error correction, and quantum hardware, a scalable quantum computer is becoming closer and closer to reality.For many proposed quantum computer architectures, e.g., topological systems based on the braiding of non-Abelian anyons [1][2][3][4][5] or the surface code model based on code deformation [6,7], so-called Clifford operations and stabilizer state preparations or measurements can be implemented efficiently and accurately.However, these operations alone are not sufficient for quantum universality since they can be simulated classically [8][9][10].One technique to achieve quantum universality is to use magic state distillation [11][12][13] to augment the set with a single non-Clifford operation, e.g., the single-qubit π/8 gate, T .This augmented set can be used to approximate any single-qubit unitary using Solovay-Kitaev decomposition [14].
The Solovay-Kitaev theorem [15] states that for any and single-qubit gate U , U can be approximated to precision using Θ(log c (1/ )) gates drawn from a universal, discrete gate set, where c is a small constant.Stateof-the-art implementations of Solovay-Kitaev decomposition result in c ∼ 3.97 [14,16], resulting in an average decomposition sequence with hundreds to thousands of T gates [16].Each T gate requires a number of copies of a quantum magic state |H (Hadamard +1-eigenstate), where the number depends on the specific state distillation protocol and purity of the state [11][12][13].Therefore, it is especially important to minimize the number of T gates when decomposing a rotation, since each T gate requires additional ancillary qubits for implementation.
In this paper, we present an alternative protocol to Solovay-Kitaev decomposition that allows the implemen-tation of single-qubit rotations through the use of distilled magic |H states.We show that the resources required by our protocol are substantially fewer than the resources required by state-of-the-art implementations of Solovay-Kitaev decomposition [16], in both the number of gates and the number of quantum magic states necessary to apply an arbitrary single-qubit unitary.

I. MAGIC STATE DISTILLATION AND IMPLEMENTATION OF ROTATIONS
Solovay-Kitaev decomposition [14][15][16] enables the approximation of any gate using an approximately universal set of elementary gates, e.g., {H, T, Λ(X)}, where Λ(X) denotes the controlled-NOT gate.In particular, one can approximate any single-qubit unitary operation using the set {H, T }.Magic state distillation is then used to produce the magic |H states necessary to implement the T gate.
We call a state |ψ magic if given n noisy copies of |ψ and the ability to perform perfect Clifford operations, we can obtain, or "distill", a purer copy of |ψ from a Clifford circuit applied to the n noisy copies of |ψ .We can then obtain even purer states which are arbitrarily close to the perfect state by applying the protocol recursively [11][12][13].These distilled states can be used to implement non-Clifford operations, e.g., the T gate, as described below.
We briefly review how to perform an arbitrary rotation about the Z-axis using a resource state, and how to apply the T gate.We assume that Clifford operations are applied perfectly, since they can be implemented faulttolerantly, and that the resource states are very close to pure.One can use the protocols of [11,12] to perform the initial distillation in order to obtain arbitrarily pure resource states for input.We focus on the +1 eigenstate The circuit to implement a rotation around the Z-axis, presented in Fig. 1, leads to the two-qubit state Upon measurement of the first qubit in the computational basis, we obtain each with probability 1/2.Thus, the angle of rotation is chosen at random to be θ or −θ, up to global phase.An analogous circuit performs a rotation about the X-axis.Similar circuits can be found in [7].
As an important example of this procedure, consider the XY -plane version of the |H state: Using the circuit in Figure 1, we can implement a Zrotation of angle ±π/4, producing at random either the T gate or its adjoint, T † .In this particular case, we can deterministically apply the desired gate T or T † by applying the phase gate S, since ST † = T .In general, however, this deterministic correction will not be possible.

II. NEW STATES FROM |H STATES
In this section, we show that we can use a very simple two-qubit Clifford circuit to obtain other non-stabilizer states using only |H states as an initial resource, and then show that these states enable the approximation of any single-qubit rotation.We assume that we are provided with perfect copies of |H .We would like to minimize the number of |H states required to implement an arbitrary single-qubit rotation, since these distilled states can be costly to produce.Consider the circuit of Fig. 2. One can easily verify that it measures the parity of the two input qubits and decodes the resulting state into the second qubit.We begin by considering the two inputs to be |H states.We define θ 0 = π 8 and |H = |H 0 = cos θ 0 |0 + sin θ 0 |1 .The circuit begins as: Upon measurement of the first qubit, we have We define θ 1 such that from which we deduce cot θ 1 = cot 2 θ 0 .We define |H 1 = cos θ 1 |0 + sin θ 1 |1 , a non-stabilizer state obtained from |H states, Clifford operations, and measurements.If the outcome of the measurement is 1, then we obtain a stabilizer state and discard the output (see Fig. 2).
The two measurement outcomes occur with respective probabilities We now recurse on this protocol using the nonstabilizer states produced by the previous round of the protocol as part of the input to the circuit of Fig. 2. We define If we use as input a copy of |H i and a copy of |H 0 , we have Upon measurement of the first qubit, we have where Thus, if we measure 0, we obtain the state |H i+1 and if we measure 1, we obtain the state |H i−1 .The probability of measuring 0 is given by p 0,i = cos 2 θ i cos 2 θ 0 + sin 2 θ i sin 2 θ 0 .
Note that 3 4 ≤ p 0,i < cos 2 π 8 = 0.853 . ... We can view this recursive process as a semi-infinite random walk with biased non-homogeneous probabilities, as Fig. 3 illustrates.Every time a step is taken along this "ladder" of states, one copy of |H is consumed, except at the first node of the ladder (the production of state |H 0 ) when we require two copies of |H : if the outcome 1 is measured at the first node, we discard the output and start with two new copies of |H .
Table I lists the rotations obtained from the first few i recursions, using the |H i states, and Figs. 4, 5 illustrate.Note that there is a factor of two difference between the angle θ i involved in the description of the state and the rotation applied, e.g., the |H 0 state is over θ 0 = π 8 , and can be used to implement a π 4 rotation.Also, as 0 < θ i < π 4 (∀i), the discontinuity of the cotangent is never a problem.

III. NUMERICAL STUDY OF RANDOM Z-ROTATIONS
In this section, we numerically study the cost of implementing single-qubit rotations.Although the circuit in Fig. 1 randomly applies θ or −θ, we show that our protocol still results in an efficient application of the desired Z-rotation.
Assume that we have the ideal case, where θ i = 2θ i+1 .In this hypothetical scenario, if we try to apply some rotation using |H i and it fails, then we can correct the gate by applying a rotation using |H i−1 .If this gate also fails, then we follow with |H i−2 , and so on.There are two crucial facts to point out.First, the probability of failing n times in a row scales as 1/2 n , i.e., it decays exponentially with n, such that the expected number of iterations is well-behaved.Second, and very importantly, if a T gate fails (using a |H 0 state), we can always correct deterministically by applying the phase gate S (which is a Z-rotation by e iπ/2 ).
Unfortunately, θ i = 2θ i+1 ; however, this assumption is not too far from the truth (see Fig. 5).Since we only require that a gate be approximated to a given precision, we can actually apply any Z-rotation rapidly and with good accuracy.First, apply the |H i -rotation that gets you closest to your target angle and then recurse.The next gate to apply will depend on whether the current gate succeeded or not.Due to the property discussed in the previous paragraph, we will rapidly converge towards our target angle.
We now present simulation results of obtaining the Z- rotation Z(φ), where φ is chosen randomly, in order to characterize the efficiency of the protocol proposed in previous sections.The simulation proceeds as follows: 1. Set desired accuracy .
3. Find the state |H i such that 2θ i is close to φ.
4. Simulate an instance of the ladder to obtain that state and add its cost to the offline cost.
5. Apply a rotation using the |H i state and the circuit of Fig. 1 and add one to the online cost.
6. Recurse on steps 3 through 5 until the desired accuracy is reached.
We define the accuracy of the applied rotation V compared to the target rotation U as where is the trace distance between states ρ and σ.If U and V are rotations about the same axis, one can show that for small angles of rotation, which will always be our case, this reduces to the difference of rotation angles, = ∆φ.In [16], the distance measure used is In the case of rotations about the same axis, it can be reduced to 1 − | cos(∆φ)| ≈ ∆φ/ √ 2 for small ∆φ.This conversion between the two measures is important since we later compare performance.
To compare the resource cost of our protocol to Solovay-Kitaev decomposition, we define an online and offline cost to apply a unitary gate.The online cost, C on , is the expected number of non-Clifford gates, or |H i states, required to implement the unitary gate on a  qubit.The offline cost, C off , is the total number of distilled |H states required to obtain all of the intermediate |H i states used to perform the given unitary operation, that is, the sum of the |H states used for each ladder process.In our resource costs, we do not include the initial cost to distill |H states.For Solovay-Kitaev decomposition, the offline cost is always 0 and the online cost is the total number of T and T † gates in the decomposition.
We ran the simulation for target accuracies ranging between 10 −12 < < 10 −4 , each time considering a new random angle to produce a sample of ∼ 1.8×10 4 instances of this protocol.Just like in the case of Solovay-Kitaev decomposition, we suppose that where C on and C off are the online and offline costs, respectively, such that ln C on ∼ c ln ln( 1 ), ln C off ∼ c ln ln ( 1 ).
The results are given in Fig. 6.Fits are also presented from which we deduce that c ∼ 1.29 and c ∼ 2.27 for our protocol.
As discussed in Section V, both of these scalings represent a significant improvement over the best implementation, to our knowledge, of Solovay-Kitaev decomposition [16], which was itself a significant improvement over the previous implementation of [14].
Note that one can implement any single-qubit unitary U using three rotations around the X-and Z-axes [10]: for some angles α, β, γ.We have explicitly shown simulaton results for Z-rotations, however X-rotations can be obtained at the same cost using the X-rotation circuit given in Fig. 1.Thus we can use our protocol to produce each of the three rotations, and produce any desired single-qubit unitary operation.

A. Other states
To further reduce the resource costs and their respective scalings, we show that we can use different Clifford circuits to produce new non-Clifford states that can be used as initial resources for the previously presented protocols.We first introduce three new states ψ 0,1,2 0 and discuss how to combine them into the described protocol.
Consider the circuit of Fig. 7.It is a Clifford circuit to which we input four copies of |H .The measurement outcome 000 occurs with probability 3(2 + √ 2)/32 ≈ 0.320, otherwise the output is discarded.If the measurement yields result 000, then the produced state is Since the probability of success is 0.320 and that every trial consumes four copies of |H , the average cost to produce ψ 0 0 is 12.50 |H states.This circuit was designed to measure the stabilizer code presented in Table II.Another interesting state can be obtained from the same circuit, substituting one of the input states by a |+ state as is illustrated by Fig. 8.The measurement outcome 000 is obtained with probability (6 + √ 2)/32 ≈ 0.232.The corresponding output state is Since the probability of success is 0.232 and every trial consumes three copies of |H , the average cost to produce ψ 1 0 is 12.95 |H states.Fig. 9 presents another useful circuit.The measurement outcome 000 is obtained with probability 11/32 ≈ 0.344.The corresponding output state is Circuit to produce ψ 0 0 states.The probability of success is 0.320 and every trial consumes four copies of |H such that the average cost is 12.50 to produce a copy of ψ 0 0 .
The stabilizer code decoded by the circuit of Fig. 7.
The probability of success is 0.344 and every trial consumes four copies of |H such that the average cost to produce ψ 2 0 is 11.64 |H states.Table III presents the stabilizer code in terms of its generators S that are decoded by the circuit.
We will use these states as input states to the circuit given in Fig. 2, where one of these states is used in place of the |H 0 input state.We start with a copy of ψ i 0 and a copy of |H 0 .If measurement outcome 1 is obtained, the state is discarded.Otherwise, we obtain Similarly to the |H i states, we define If we input a copy of ψ j i and a copy of |H 0 , we obtain states.The probability of success is 0.232 and every trial consumes four copies of |H such that the average cost is 12.95 to produce a copy of ψ 1 0 .
states.The probability of success is 0.344 and every trial consumes four copies of |H such that the average cost is 11.64 to produce a copy of ψ 2 0 .
The stabilizer code decoded by the circuit of Fig. 9.
such that the output state obtained is, depending on measurement outcome, New "ladders" of states can be obtained using the ψ 0,1,2 0 states as inputs in place of the |H 0 states.Fig. 10 shows the four ladders.Table IV lists the rotations obtained from the first few i recursions and Fig. 11 illustrates.We see that the set of possible rotations is more dense.We reproduced the numerical experiment of Section III with basic offline costs 12.50, 12.95 and 11.64 for ψ 0 0 , ψ 1 0 , and ψ 2 0 , respectively.The results are presented in Fig. 13.Since the set of states is denser, we expected improved scalings for both the online and offline costs.This is indeed the case, we find c ∼ 1.12 and c ∼ 1.75.
However, the basic offline costs of our new states ψ i 0 are significantly higher; for precision ∼ 10 −4 , even though the online cost is smaller using the new states, the offline cost is still smaller if we restrict ourselves to the simpler scheme using only |H states.For the protocol using the new input states to reduce both the online and offline costs, we need to consider precisions smaller then ≈ 1.28 × 10 −5 , see Fig. 12.
The reason why we do not consider other measurement outcomes in the circuits considered in this section is that in general, potential errors on the |H states are amplified by the circuit.Output states in these cases might still prove useful, but a careful analysis of the evolution of errors must be conducted.

B. Minimizing the online cost
In this section, we aim to minimize the online cost, at the price of potentially increasing the offline cost.
The protocol up to this point can be summarized as follows.Suppose one wants to implement a Z-rotation of an arbitrary angle φ on a logical state |ψ .One has to implement a sequence of j rotations {Z(2θ ij )} on |ψ using the sequence of states { H ij }, such that Z(φ) ≈   FIG. 13.Target accuracies are chosen such that 10 −12 < < 10 −4 and the sample size is ∼ 1.8 × 10 4 .The clouds of points are used to fit the data according to a linear fit.We obtain c ∼ 1.12 and c ∼ 1.75, improving on the results of Fig. 6. j Z(2θ ij ).The online cost is given by { H ij } .Consider instead the following protocol to implement the same rotation by angle φ.Prepare offline the state |Z(φ) from a copy of |0 .To achieve this, use the protocol described in the previous paragraph to rotate |0 to |Z(φ) .Note that you can implement it offline because you are applying rotations on an ancilla state.Then, use |Z(φ) online to apply the desired rotation.With probability 1  2 , the rotation Z(φ) is applied and the online cost is 1.If it fails, correct for it by preparing offline the state |Z(2φ) .Again, with probability 1  2 , the overall rotation Z(φ) is applied and the online cost is 2. If it fails, prepare offline |Z(4φ) , and so on.The probability that a number n of iterations is required before success de- creases exponentially with n.This process is a negative binomial of parameter p = 1 2 and the expected number of online rotations before success goes as ∼ 1 p = 2.We numerically simulated this process for various random angles, 0 < φ < 2π, and accuracies, 10 −12 < < 10 −4 .As Fig. 14 illustrates, we note that there is no significant change in the online cost for different values of .Moreover, the expected number of online rotations to apply is roughly two, the result one would expect in this situation.The scaling of the offline cost is the same as that of the previous scheme, as expected.Also, even though the scaling is the same, the actual values of the offline cost are bigger.The shift is 1.13 − 0.64 = 0.59, see Figs. 13(b) and 14(b), which corresponds to a factor of e 0.59 ≈ 1.80.One would expect a slightly bigger shift of the offline cost by ln 2 since we repeat the scheme twice on average.This suggests that there might exist some favorable correlations between the expected offline cost of an angle θ and of twice that angle 2θ.

IV. ERRONEOUS STATES
In this section, we determine the effect of errors on our resource |H states, and their effect on the produced |H i states.A priori, the errors might be amplified by the two-qubit circuit of Fig. 2, however we show this is not the case.
presented protocol is not well suited for an analytical study of the evolution of errors on |H i states.Instead, we rely on a numerical study for three different types of errors.We use the trace distance on states ρ and σ, to measure the accuracy of the imperfect |H i states.We assume Clifford operations are perfect and that errors can only occur on the |H states.We consider three types of erroneous states.First, we assume that the mixed state, ρ a 0 , is perfectly along the line joining the center of the Bloch sphere and the the perfect state, i.e., is the state orthogonal to |H 0 .We denote the imperfect version of |H i obtained from ρ a 0 states as ρ a i .If Clifford operations are perfect, we can always bring any mixed state into this form using twirling [12].However, for the protocol to be of practical interest, we require it to remain stable under the two following types of errors, where we assume that the state is pure, but that the rotation is slightly off of the desired axis by δ: We numerically generated pseudo-random instances of the scheme to produce |H i states for different values of i and for different noise strengths.We considered 1000 instances for each of the three types of errors and noise strengths 10 −4 , 10 −6 and 10 −8 .Figures 15 and 16 show that the protocol actually reduces the amplitude of possible errors on the resource |H states, such that if we start with |H meeting our target accuracy, all the subsequent derived |H i states will also meet it.This even suggests that for bigger values of i, one could use noisier |H states and still achieve the desired accuracy.This could make a dramatic difference if it enables one to reduce the number of distillation recursions necessary to prepare the |H states.
We note a very similar behavior for the three types of errors.The exponential decay of the distance between erroneous and ideal states confirms that the errors are well behaved under the proposed protocol.We note that the bases for the exponential decay of the errors are comparable, but smaller, than the basis for the exponential decay of the angle implemented.So, for a given error rate, there exits a point where the angle of rotation implemented by |H i for some i is going to be comparable or smaller to the error on that angle.However, this is not a problem in practice since for, e.g., ∼ 10 −4 , we find i = 150.For this value of i the angle is θ i ∼ 10 −57 .

V. COMPARISON TO SOLOVAY-KITAEV DECOMPOSITION
In this section, we compare the performance of the Solovay-Kitaev decomposition of [16] and that of the schemes presented in this article.In order to do this, we first consider different Z-rotations of angles π/16, π/128, and π/1024 and different accuracies 10 −4 , 10 −8 and 10 −12 .Table V lists the expected costs.C on and C off are respectively the online and offline costs to implement the gate using only |H states.C on and C off refer to the costs for the version of the scheme thats uses {|H , ψ 0 0 , ψ 1 0 , ψ 2 0 } as initial resource states.C SK refers to the extrapolated cost to implement these gates using the results from [16].This extrapolated cost averages over all unitaries (c ∼ 3.40).This is optimistic since the results of [16] suggest that Z-rotations are actually harder to implement (c ∼ 4.3).Note that in theory, the cost of this algorithm is O(log c (1/ ), where c = 3.97 [14].In all cases, the online cost is minimal when our proposed scheme enhanced by { ψ 0 0 , ψ 1 0 , ψ 2 0 } is used.For rougher precision, e.g., 10 −4 , the offline cost might be such that the total cost is still minimal for the Solovay-Kitaev implementation of [16].For finer precision, e.g., 10 −8 or 10 −12 , our proposed protocol becomes very advantageous, since the cost of the Solovay-Kitaev decomposition becomes prohibitive.
We also compare the average behavior of the different schemes as Figs.17 and 18 illustrate.We first start by noting that, loosely speaking, a random unitary is composed of three random rotations, such that the curves presented previously in Fig. 12 must be shifted by ln 3 in general.Fig. 17 plots the fit for the Solovay-Kitaev decomposition (solid line), the online cost (dashed) and of- refer to the costs for the version of the scheme thats uses {|H , ψ 0 0 , ψ 1 0 , ψ 2 0 } as initial resource states.CSK refers to the extrapolated cost to implement these gates using the results from [16].
fline cost (dotted).For all practical accuracies, the online cost of our proposed scheme is consistently the smallest.However, the offline cost becomes advantageous when < 8.71 × 10 −4 for Z-rotations and < 2.67 × 10 −7 for random unitaries.Fig. 18 plots the same for the scheme with additional initial resource states.Similarly, the offline cost becomes advantageous when < 4.41 × 10 −4 for Z-rotations and < 1.03 × 10 −6 for random unitaries.

VI. CONCLUSION
We have proposed an alternative protocol to Solovay-Kitaev decomposition that results in significantly smaller resource costs, in both the number of required resource states and the depth of the circuit.We have shown a significant improvement on average in the value of c, and in many cases the number of distilled states and rotations required to implement a single-qubit unitary gate are reduced.Another advantage of our protocol is that the number of resources required is a "smoother" function of accuracy, whereas Solovay-Kitaev decomposition is step-like in nature because of the recursion process used in practice.However, note that our protocols and Solovay-Kitaev decomposition are not exclusive.It might be that some unitaries are better implemented using Solovay-Kitaev decomposition, while our scheme is better suited for Z-rotations, which occur, among other algorithms, in the quantum Fourier transform.
As future research, there are likely a variety of other circuits that enable other "ladders" of states.One natural extension would be to use the SH eigenstates distilled using the protocols of [11,13].Another extension would be to perform a systematic study of "small" Clifford circuits.Finally, we note that implementing a rotation by choosing the state which results in an angle closest to the target angle is a simple way of achieving our goal, but it is surely suboptimal.An important research direction would be to optimize the sequence of angles required to implement the desired rotation.

FIG. 3 .
FIG. 3. Process of obtaining other non-stabilizer states from initial |H states.A copy of |Hi and |H0 probabilistically yield a copy of |Hi−1 or |Hi+1 using the circuit of Fig. 2. Each step along the ladder costs one copy of |H0 , except the first one which costs two.

FIG. 6 .
FIG. 6. Target accuracies are chosen such that 10 −12 < < 10 −4 and the sample size is ∼ 1.8 × 10 4 .The clouds of points are used to fit the data according to a linear fit.We obtain c ∼ 1.29 and c ∼ 2.27.
FIG. 14. Target accuracies are chosen such that 10 −12 < < 10 −4 and the sample size is ∼ 1.8 × 10 3 .(a) The cloud of points is averaged to get the expected number of online rotations.(b) The cloud of point is used to perform a linear fit.

C
FIG. 17. Full line: Cost of Solovay Kitaev decompostion (SKD) of (a) random Z-rotations and (b) random unitaries as a function of the precision .Dotted line: Offline costs of (a) random Z-rotations and (b) random unitaries.We assume that, loosely speaking, a random unitary is the composition of three random rotations, hence the additional factor of three.Dashed line: Online cost of (a) random Z-rotations and (b) random unitaries.(a) For practical values of , the online cost is significantly smaller than SK.The offline cost is lower than SK when ≤ 8.71 × 10 −4 .(b) Again, for practical values of , the online cost is significantly smaller than SK.The offline cost is lower than SK when ≤ 2.67 × 10 −7 .
FIG. 18. Full line: Cost of Solovay Kitaev decompostion (SKD) of (a) random Z-rotations and (b) random unitaries as a function of the precision .Dotted line: Offline costs, using {|H , ψ 0 , ψ 1 , ψ 2 } as initial resources, of (a) random Z-rotations and (b) random unitaries.Dashed line: Online cost, using {|H , ψ 0 , ψ 1 , ψ 2 } as initial resources, of (a) random Z-rotations and (b) random unitaries.(a) For practical values of , the online cost is significantly smaller than SK.The offline cost is lower than SK when ≤ 4.41 × 10 −4 .(b) Again, for practical values of , the online cost is significantly smaller than SK.The offline cost is lower than SK when ≤ 1.03 × 10 −6 .
1. Circuit randomly implementing a rotation of angle ±θ around the Z(X)-axis.andomitthe cost of the initial distillation of the |H state.We concentrate on single-qubit states found in either the XZ-or XY -plane of the Bloch sphere.Note that one can easily move a state from one plane to the other by the application of the Clifford HSHX operation.Suppose we start with states |Z(θ) and |ψ : Two-qubit circuit used to obtain new |Hi states from initial resource states |H0 .Upon measuring the 0 outcome, the output state is |Hi+1 .Upon measuring the 1 outcome, the output state is |Hi−1 .

TABLE V .
Con and C off are respectively the online and offline costs to implement the Z-rotation by angle θ using only |H states, to precision .C on and C off